{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# assignment1_1：复现课堂代码Syntax Tree and Probability Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1 给定一个语法，怎么样生成一个语法\n",
    "eg: two number: 一个number后面跟着一个number"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.1 简单two_nums实现"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "\n",
    "two_number = \"\"\"\n",
    "2_num => num num\n",
    "num => 0 | 1 | 2 | 3 | 4 \n",
    "\"\"\"\n",
    "\n",
    "def two_num():\n",
    "    return num() + num()\n",
    "def num():\n",
    "    return random.choice(\"0 | 1 | 2 | 3 | 4 \".split('|'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' 4 '"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "num()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' 2  3 '"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "two_num()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2 nums进阶"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "two_number = \"\"\"\n",
    "numbers => num numbers | num\n",
    "num => 0 | 1 | 2 | 3 | 4 \n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def numbers():\n",
    "    if random.random() < 0.5:\n",
    "        return num()\n",
    "    else:\n",
    "        return num() + numbers()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' 4  3  4  1  4 '"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "numbers()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2 基于语法规则简单中文实现"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### 2.1 简单名词，动词短语组合"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [],
   "source": [
    "simple_grammar = \"\"\"\n",
    "sentence => noun_phrase verb_phrase\n",
    "noun_phrase => Article Adj* noun  \n",
    "Adj* => null | Adj Adj*\n",
    "verb_phrase => verb noun_phrase\n",
    "Article =>  一个 | 这个\n",
    "noun =>   女人 |  篮球 | 桌子 | 小猫\n",
    "verb => 看着   |  坐在 |  听着 | 看见\n",
    "Adj =>  蓝色的 | 好看的 | 小小的\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "\n",
    "def adj():\n",
    "    return random.choice(\"蓝色的 | 好看的 | 小小的\".split(\"|\")).split()[0]\n",
    "\n",
    "def adj_star():\n",
    "    return random.choice([lambda : '', lambda : adj() + adj_star()])()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'蓝色的'"
      ]
     },
     "execution_count": 111,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adj()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'好看的小小的小小的蓝色的'"
      ]
     },
     "execution_count": 112,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adj_star()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 nums进阶-多语法\n",
    "使运算符号变化的时候，程序不用变"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {},
   "outputs": [],
   "source": [
    "two_number_add = \"\"\"\n",
    "two => num + num \n",
    "num => 0 | 1 | 2 | 3 | 4\n",
    "\"\"\"\n",
    "two_number_op = \"\"\"\n",
    "two => num + num | num - num\n",
    "num => 0 | 1 | 2 | 3 | 4\n",
    "\"\"\"\n",
    "\n",
    "def generate_grammar(grammar_str: str, target, split='=>'):\n",
    "    grammar = dict()\n",
    "  \n",
    "    for line in grammar_str.split('\\n'):\n",
    "        # 判断是否为空  \n",
    "        if not line: \n",
    "            continue\n",
    "        # two => num + num\n",
    "        expression, formula = line.split(split) # 返回表达式以及对应的语法\n",
    "        formulas = formula.split('|')\n",
    "        formulas = [f.split() for f in formulas]\n",
    "        grammar[expression.strip()] = formulas # 此处为dict类型，key:two, value:formulas\n",
    "        # eg:{'two ': [['num', '+', 'num'], ['num', '-', 'num']], 'num ': [['0'], ['1'], ['2'], ['3'], ['4']]}\n",
    "        print(expression,\":\",  formulas)\n",
    "    print(grammar)\n",
    "    return grammar"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "two  : [['num', '+', 'num'], ['num', '-', 'num']]\n",
      "num  : [['0'], ['1'], ['2'], ['3'], ['4']]\n",
      "{'two': [['num', '+', 'num'], ['num', '-', 'num']], 'num': [['0'], ['1'], ['2'], ['3'], ['4']]}\n"
     ]
    }
   ],
   "source": [
    "generate_grammar = generate_grammar(two_number_op, target=\"two\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {},
   "outputs": [],
   "source": [
    "choice_a_expr = random.choice\n",
    "\n",
    "def generate_by_grammar(grammar: dict, target:str):\n",
    "    # 测试target是否是一个key\n",
    "    if target not in grammar:\n",
    "        return target\n",
    "    expr = choice_a_expr(grammar[target]) # 选择value中的一个list, eg:['num', '+', 'num']\n",
    "    print(\"expr\", expr)\n",
    "    return ' '.join(generate_by_grammar(grammar, t) for t in expr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "expr ['num', '-', 'num']\n",
      "expr ['2']\n",
      "expr ['1']\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'2 - 1'"
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "generate_by_grammar(generate_grammar, \"two\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**上述代码汇总一下**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "expression  : [['expression', 'num_op'], ['num_op']]\n",
      "num_op  : [['num', 'op', 'num']]\n",
      "op  : [['+'], ['-'], ['*'], ['/']]\n",
      "num  : [['0'], ['1'], ['2'], ['3'], ['4']]\n",
      "expr ['expression', 'num_op']\n",
      "expr ['expression', 'num_op']\n",
      "expr ['num_op']\n",
      "expr ['num', 'op', 'num']\n",
      "expr ['1']\n",
      "expr ['/']\n",
      "expr ['0']\n",
      "expr ['num', 'op', 'num']\n",
      "expr ['0']\n",
      "expr ['-']\n",
      "expr ['4']\n",
      "expr ['num', 'op', 'num']\n",
      "expr ['1']\n",
      "expr ['/']\n",
      "expr ['3']\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'1 / 0 0 - 4 1 / 3'"
      ]
     },
     "execution_count": 117,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "two_number_add = \"\"\"\n",
    "two => num + num \n",
    "num => 0 | 1 | 2 | 3 | 4\n",
    "\"\"\"\n",
    "number_ops = \"\"\"\n",
    "expression => expression num_op | num_op\n",
    "num_op => num op num\n",
    "op => + | - | * | /\n",
    "num => 0 | 1 | 2 | 3 | 4\n",
    "\"\"\"\n",
    "\n",
    "def generate_grammar(grammar_str: str, target, split='=>'):\n",
    "    grammar = dict()  \n",
    "    for line in grammar_str.split('\\n'):\n",
    "        # 判断是否为空  \n",
    "        if not line: \n",
    "            continue\n",
    "        # two => num + num\n",
    "        expression, formula = line.split(split) # 返回表达式以及对应的语法\n",
    "        formulas = formula.split('|')\n",
    "        formulas = [f.split() for f in formulas]\n",
    "        grammar[expression.strip()] = formulas # 此处为dict类型，key:two, value:formulas\n",
    "        # eg:{'two ': [['num', '+', 'num'], ['num', '-', 'num']], 'num ': [['0'], ['1'], ['2'], ['3'], ['4']]}\n",
    "        print(expression,\":\",  formulas)\n",
    "    return grammar\n",
    "\n",
    "choice_a_expr = random.choice\n",
    "\n",
    "def generate_by_grammar(grammar: dict, target:str):\n",
    "    # 测试target是否是一个key\n",
    "    if target not in grammar:\n",
    "        return target\n",
    "    expr = choice_a_expr(grammar[target]) # 选择value中的一个list, eg:['num', '+', 'num']\n",
    "    print(\"expr\", expr)\n",
    "    return ' '.join(generate_by_grammar(grammar, t) for t in expr)\n",
    "\n",
    "def generate_by_str(grammar_str, split, target):\n",
    "    grammar = generate_grammar(grammar_str, target, split)\n",
    "    return generate_by_grammar(grammar,target)\n",
    "\n",
    "generate_by_str(number_ops, split='=>', target=\"expression\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3 中文语句生成case1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 163,
   "metadata": {},
   "outputs": [],
   "source": [
    "#在西部世界里，一个”人类“的语言可以定义为：\n",
    "\n",
    "human = \"\"\"\n",
    "human = 自己 寻找 活动\n",
    "自己 = 我 | 俺 | 我们 \n",
    "寻找 = 找找 | 想找点 \n",
    "活动 = 乐子 | 玩的\n",
    "\"\"\"\n",
    "\n",
    "#一个“接待员”的语言可以定义为\n",
    "\n",
    "host = \"\"\"\n",
    "host = 寒暄 报数 询问 业务相关 结尾 \n",
    "报数 = 我是 数字 号 ,\n",
    "数字 = 单个数字 | 数字 单个数字 \n",
    "单个数字 = 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 \n",
    "寒暄 = 称谓 打招呼 | 打招呼\n",
    "称谓 = 人称 ,\n",
    "人称 = 先生 | 女士 | 小朋友\n",
    "打招呼 = 你好 | 您好 \n",
    "询问 = 请问你要 | 您需要\n",
    "业务相关 = 玩玩 具体业务\n",
    "玩玩 = null\n",
    "具体业务 = 喝酒 | 打牌 | 打猎 | 赌博\n",
    "结尾 = 吗？\n",
    "\"\"\"\n",
    "opinion = \"\"\"\n",
    "opinion = 电影名称 标点符号 副词 形容词 标点符号\n",
    "电影名称 = 湄公河行动 | 战狼 | 唐顿庄园 | 哈利波特\n",
    "副词 = 很 | 颇 | 极 | 十分\n",
    "形容词 = 燃 | 还不错 | 难看 \n",
    "标点符号 = 。| !\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 164,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "human  : [['自己', '寻找', '活动']]\n",
      "自己  : [['我'], ['俺'], ['我们']]\n",
      "寻找  : [['找找'], ['想找点']]\n",
      "活动  : [['乐子'], ['玩的']]\n",
      "expr ['自己', '寻找', '活动']\n",
      "expr ['我们']\n",
      "expr ['想找点']\n",
      "expr ['乐子']\n",
      "我们 想找点 乐子\n",
      "human  : [['自己', '寻找', '活动']]\n",
      "自己  : [['我'], ['俺'], ['我们']]\n",
      "寻找  : [['找找'], ['想找点']]\n",
      "活动  : [['乐子'], ['玩的']]\n",
      "expr ['自己', '寻找', '活动']\n",
      "expr ['我']\n",
      "expr ['找找']\n",
      "expr ['乐子']\n",
      "我 找找 乐子\n",
      "human  : [['自己', '寻找', '活动']]\n",
      "自己  : [['我'], ['俺'], ['我们']]\n",
      "寻找  : [['找找'], ['想找点']]\n",
      "活动  : [['乐子'], ['玩的']]\n",
      "expr ['自己', '寻找', '活动']\n",
      "expr ['我']\n",
      "expr ['想找点']\n",
      "expr ['乐子']\n",
      "我 想找点 乐子\n",
      "human  : [['自己', '寻找', '活动']]\n",
      "自己  : [['我'], ['俺'], ['我们']]\n",
      "寻找  : [['找找'], ['想找点']]\n",
      "活动  : [['乐子'], ['玩的']]\n",
      "expr ['自己', '寻找', '活动']\n",
      "expr ['我们']\n",
      "expr ['想找点']\n",
      "expr ['乐子']\n",
      "我们 想找点 乐子\n",
      "human  : [['自己', '寻找', '活动']]\n",
      "自己  : [['我'], ['俺'], ['我们']]\n",
      "寻找  : [['找找'], ['想找点']]\n",
      "活动  : [['乐子'], ['玩的']]\n",
      "expr ['自己', '寻找', '活动']\n",
      "expr ['我们']\n",
      "expr ['找找']\n",
      "expr ['乐子']\n",
      "我们 找找 乐子\n",
      "human  : [['自己', '寻找', '活动']]\n",
      "自己  : [['我'], ['俺'], ['我们']]\n",
      "寻找  : [['找找'], ['想找点']]\n",
      "活动  : [['乐子'], ['玩的']]\n",
      "expr ['自己', '寻找', '活动']\n",
      "expr ['我']\n",
      "expr ['找找']\n",
      "expr ['玩的']\n",
      "我 找找 玩的\n",
      "human  : [['自己', '寻找', '活动']]\n",
      "自己  : [['我'], ['俺'], ['我们']]\n",
      "寻找  : [['找找'], ['想找点']]\n",
      "活动  : [['乐子'], ['玩的']]\n",
      "expr ['自己', '寻找', '活动']\n",
      "expr ['我']\n",
      "expr ['找找']\n",
      "expr ['乐子']\n",
      "我 找找 乐子\n",
      "human  : [['自己', '寻找', '活动']]\n",
      "自己  : [['我'], ['俺'], ['我们']]\n",
      "寻找  : [['找找'], ['想找点']]\n",
      "活动  : [['乐子'], ['玩的']]\n",
      "expr ['自己', '寻找', '活动']\n",
      "expr ['俺']\n",
      "expr ['想找点']\n",
      "expr ['玩的']\n",
      "俺 想找点 玩的\n",
      "human  : [['自己', '寻找', '活动']]\n",
      "自己  : [['我'], ['俺'], ['我们']]\n",
      "寻找  : [['找找'], ['想找点']]\n",
      "活动  : [['乐子'], ['玩的']]\n",
      "expr ['自己', '寻找', '活动']\n",
      "expr ['我们']\n",
      "expr ['想找点']\n",
      "expr ['玩的']\n",
      "我们 想找点 玩的\n",
      "human  : [['自己', '寻找', '活动']]\n",
      "自己  : [['我'], ['俺'], ['我们']]\n",
      "寻找  : [['找找'], ['想找点']]\n",
      "活动  : [['乐子'], ['玩的']]\n",
      "expr ['自己', '寻找', '活动']\n",
      "expr ['我们']\n",
      "expr ['找找']\n",
      "expr ['乐子']\n",
      "我们 找找 乐子\n"
     ]
    }
   ],
   "source": [
    "# 随机生成10句\n",
    "for i in range(10):    \n",
    "    print(generate_by_str(human, split='=', target=\"human\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4 中文语句生成case2-host"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 165,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['打招呼']\n",
      "expr ['您好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['单个数字']\n",
      "expr ['4']\n",
      "expr ['请问你要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['打牌']\n",
      "expr ['吗？']\n",
      "您好 我是 4 号 , 请问你要 null 打牌 吗？\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['称谓', '打招呼']\n",
      "expr ['人称', ',']\n",
      "expr ['女士']\n",
      "expr ['您好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['单个数字']\n",
      "expr ['5']\n",
      "expr ['2']\n",
      "expr ['5']\n",
      "expr ['请问你要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['赌博']\n",
      "expr ['吗？']\n",
      "女士 , 您好 我是 5 2 5 号 , 请问你要 null 赌博 吗？\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['称谓', '打招呼']\n",
      "expr ['人称', ',']\n",
      "expr ['小朋友']\n",
      "expr ['您好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['单个数字']\n",
      "expr ['3']\n",
      "expr ['请问你要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['打牌']\n",
      "expr ['吗？']\n",
      "小朋友 , 您好 我是 3 号 , 请问你要 null 打牌 吗？\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['打招呼']\n",
      "expr ['您好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['单个数字']\n",
      "expr ['8']\n",
      "expr ['请问你要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['赌博']\n",
      "expr ['吗？']\n",
      "您好 我是 8 号 , 请问你要 null 赌博 吗？\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['打招呼']\n",
      "expr ['你好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['单个数字']\n",
      "expr ['4']\n",
      "expr ['1']\n",
      "expr ['您需要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['打猎']\n",
      "expr ['吗？']\n",
      "你好 我是 4 1 号 , 您需要 null 打猎 吗？\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['称谓', '打招呼']\n",
      "expr ['人称', ',']\n",
      "expr ['先生']\n",
      "expr ['你好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['单个数字']\n",
      "expr ['4']\n",
      "expr ['6']\n",
      "expr ['您需要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['打牌']\n",
      "expr ['吗？']\n",
      "先生 , 你好 我是 4 6 号 , 您需要 null 打牌 吗？\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['称谓', '打招呼']\n",
      "expr ['人称', ',']\n",
      "expr ['女士']\n",
      "expr ['你好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['单个数字']\n",
      "expr ['7']\n",
      "expr ['请问你要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['打牌']\n",
      "expr ['吗？']\n",
      "女士 , 你好 我是 7 号 , 请问你要 null 打牌 吗？\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['称谓', '打招呼']\n",
      "expr ['人称', ',']\n",
      "expr ['小朋友']\n",
      "expr ['您好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['单个数字']\n",
      "expr ['9']\n",
      "expr ['您需要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['赌博']\n",
      "expr ['吗？']\n",
      "小朋友 , 您好 我是 9 号 , 您需要 null 赌博 吗？\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['打招呼']\n",
      "expr ['你好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['单个数字']\n",
      "expr ['2']\n",
      "expr ['7']\n",
      "expr ['5']\n",
      "expr ['请问你要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['打牌']\n",
      "expr ['吗？']\n",
      "你好 我是 2 7 5 号 , 请问你要 null 打牌 吗？\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['打招呼']\n",
      "expr ['你好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['单个数字']\n",
      "expr ['4']\n",
      "expr ['1']\n",
      "expr ['7']\n",
      "expr ['请问你要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['打猎']\n",
      "expr ['吗？']\n",
      "你好 我是 4 1 7 号 , 请问你要 null 打猎 吗？\n"
     ]
    }
   ],
   "source": [
    "# 随机生成10句\n",
    "for i in range(10):    \n",
    "    print(generate_by_str(host, split='=', target=\"host\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 167,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "opinion  : [['时间', '地点', '影院', '电影名称', '副词', '形容词']]\n",
      "时间  : [['年', '月', '日']]\n",
      "年  : [['数字', '年']]\n",
      "月  : [['数字', '月']]\n",
      "日  : [['数字', '日']]\n",
      "数字  : [['1'], ['2'], ['3'], ['4']]\n",
      "地方  : [['南京'], ['上海'], ['广东']]\n",
      "影院  : [['卢米埃影城'], ['幸福蓝海国际影城'], ['时代华纳影城']]\n",
      "电影名称  : [['湄公河行动'], ['战狼'], ['唐顿庄园'], ['哈利波特']]\n",
      "副词  : [['很'], ['颇'], ['极'], ['十分']]\n",
      "形容词  : [['燃'], ['好看'], ['难看']]\n",
      "expr ['时间', '地点', '影院', '电影名称', '副词', '形容词']\n",
      "expr ['年', '月', '日']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['2']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n",
      "expr ['3']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['1']\n",
      "expr ['数字', '年']\n",
      "expr ['4']\n",
      "expr ['数字', '年']\n"
     ]
    },
    {
     "ename": "RecursionError",
     "evalue": "maximum recursion depth exceeded while calling a Python object",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mRecursionError\u001b[0m                            Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-167-dcda6b47825c>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m     14\u001b[0m \u001b[0;31m# 随机生成10句\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     15\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m10\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 16\u001b[0;31m     \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgenerate_by_str\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mopinion2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msplit\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'='\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtarget\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"opinion\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m<ipython-input-117-88539236a0ac>\u001b[0m in \u001b[0;36mgenerate_by_str\u001b[0;34m(grammar_str, split, target)\u001b[0m\n\u001b[1;32m     37\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mgenerate_by_str\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgrammar_str\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msplit\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtarget\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     38\u001b[0m     \u001b[0mgrammar\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgenerate_grammar\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgrammar_str\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtarget\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msplit\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 39\u001b[0;31m     \u001b[0;32mreturn\u001b[0m \u001b[0mgenerate_by_grammar\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgrammar\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mtarget\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     40\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     41\u001b[0m \u001b[0mgenerate_by_str\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnumber_ops\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msplit\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'=>'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtarget\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"expression\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m<ipython-input-117-88539236a0ac>\u001b[0m in \u001b[0;36mgenerate_by_grammar\u001b[0;34m(grammar, target)\u001b[0m\n\u001b[1;32m     33\u001b[0m     \u001b[0mexpr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mchoice_a_expr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgrammar\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mtarget\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# 选择value中的一个list, eg:['num', '+', 'num']\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     34\u001b[0m     \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"expr\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexpr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 35\u001b[0;31m     \u001b[0;32mreturn\u001b[0m \u001b[0;34m' '\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjoin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgenerate_by_grammar\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgrammar\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mt\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mt\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mexpr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     36\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     37\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mgenerate_by_str\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgrammar_str\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msplit\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtarget\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m<ipython-input-117-88539236a0ac>\u001b[0m in \u001b[0;36m<genexpr>\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m     33\u001b[0m     \u001b[0mexpr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mchoice_a_expr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgrammar\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mtarget\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# 选择value中的一个list, eg:['num', '+', 'num']\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     34\u001b[0m     \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"expr\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexpr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 35\u001b[0;31m     \u001b[0;32mreturn\u001b[0m \u001b[0;34m' '\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjoin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgenerate_by_grammar\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgrammar\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mt\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mt\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mexpr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     36\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     37\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mgenerate_by_str\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgrammar_str\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msplit\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtarget\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "... last 2 frames repeated, from the frame below ...\n",
      "\u001b[0;32m<ipython-input-117-88539236a0ac>\u001b[0m in \u001b[0;36mgenerate_by_grammar\u001b[0;34m(grammar, target)\u001b[0m\n\u001b[1;32m     33\u001b[0m     \u001b[0mexpr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mchoice_a_expr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgrammar\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mtarget\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# 选择value中的一个list, eg:['num', '+', 'num']\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     34\u001b[0m     \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"expr\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexpr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 35\u001b[0;31m     \u001b[0;32mreturn\u001b[0m \u001b[0;34m' '\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjoin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgenerate_by_grammar\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgrammar\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mt\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mt\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mexpr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     36\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     37\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mgenerate_by_str\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgrammar_str\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msplit\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtarget\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mRecursionError\u001b[0m: maximum recursion depth exceeded while calling a Python object"
     ]
    }
   ],
   "source": [
    "opinion2 = \"\"\"\n",
    "opinion = 时间 地点 影院 电影名称 副词 形容词\n",
    "时间 = 年 月 日\n",
    "年 = 数字 年\n",
    "月 = 数字 月\n",
    "日 = 数字 日\n",
    "数字 = 1 | 2 | 3 | 4\n",
    "地方 = 南京 | 上海 | 广东\n",
    "影院 = 卢米埃影城 | 幸福蓝海国际影城 | 时代华纳影城\n",
    "电影名称 = 湄公河行动 | 战狼 | 唐顿庄园 | 哈利波特\n",
    "副词 = 很 | 颇 | 极 | 十分\n",
    "形容词 = 燃 | 好看 | 难看 \n",
    "\"\"\"\n",
    "# 随机生成10句\n",
    "for i in range(10):    \n",
    "    print(generate_by_str(opinion2, split='=', target=\"opinion\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.5 小结\n",
    "- 语法可以通过定义最简单的函数实现\n",
    "- 可以通过递归实现更复杂的问题\n",
    "- Eliza 推出的第一款，就是基于规则引擎。data driven --> procedure driven. 人们后来开始思考，从数据着手，数据控制，具体的细节人不需要去考虑.未来的考虑都是从问题变了，但是程序没有变着手。\n",
    "    ```\n",
    "    machine learning \n",
    "    based on statical data driven way\n",
    "    ```\n",
    "- 规则引擎：输入的规则变了，但是程序不用变\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3 语法学习"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1 lambda函数\n",
    "Python中，lambda函数也叫匿名函数，及即没有具体名称的函数，它允许快速定义单行函数，类似于C语言的宏，可以用在任何需要函数的地方。这区别于def定义的函数。\n",
    "lambda与def的区别：\n",
    "    1）def创建的方法是有名称的，而lambda没有。\n",
    "    2）lambda会返回一个函数对象，但这个对象不会赋给一个标识符，而def则会把函数对象赋值给一个变量（函数名）。\n",
    "    3）lambda只是一个表达式，而def则是一个语句。\n",
    "    4）lambda表达式” : “后面，只能有一个表达式，def则可以有多个。\n",
    "    5）像if或for或print等语句不能用于lambda中，def可以。\n",
    "    6）lambda一般用来定义简单的函数，而def可以定义复杂的函数。\n",
    "    6）lambda函数不能共享给别的程序调用，def可以。\n",
    "lambda语法格式：\n",
    "lambda 变量 : 要执行的语句\n",
    "~~~\n",
    "lambda [arg1 [, agr2,.....argn]] : expression\n",
    "~~~\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 单个函数\n",
    "g = lambda x : x ** 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "9\n"
     ]
    }
   ],
   "source": [
    "print(g(3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "27\n"
     ]
    }
   ],
   "source": [
    "# 多个函数\n",
    "g = lambda x, y, z : (x+y)**z\n",
    "print(g(1,2,3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "8"
      ]
     },
     "execution_count": 124,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 返回一个函数对象\n",
    "list_a = [lambda a : a ** 3, lambda b : b**3]\n",
    "list_a[1](2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 random.choice用法\n",
    "~~~\n",
    "import random\n",
    "\n",
    "random.choice( seq  )\n",
    "\n",
    "seq: 可以是一个列表/元组/字符串\n",
    "~~~"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4 语言模型\n",
    "- 李开复读博的时候，研究的是语言模型\n",
    "\n",
    "\n",
    "### 4.1 review\n",
    "- 条件概率\n",
    "- 独立概率\n",
    "```\n",
    "365天，迟到30次\n",
    "Pr(迟到) = 30/365\n",
    "365天，拉肚子60次，其中20次迟到\n",
    "Pr(迟到|拉肚子) = 20/60 = Pr(迟到&拉肚子) / Pr(拉肚子)\n",
    "                      = (20/365) / (60/365)\n",
    "                      = 20/60\n",
    "Pr(迟到|伊利发生车祸) = Pr(你迟到)   \n",
    "                   = Pr(迟到&伊利发生车祸)/Pr（伊利发生车祸）\n",
    "                   = Pr(迟到）* Pr（伊利发生车祸）/Pr（伊利发生车祸）\n",
    "Pr(迟到|肚子痛&伊利发生车祸) = Pr(你迟到|肚子痛)\n",
    "                         = Pr(你迟到&肚子痛) / Pr(肚子痛)\n",
    "                         ～ Count(你迟到且肚子痛) / Count(肚子痛)\n",
    "```                       \n",
    "PS ： 其实就和随机森林原理一样\n",
    "```\n",
    "假设只和最近的有关系\n",
    "\n",
    "-> Pr(其实&就和&随机森林&原理&一样)\n",
    "-> Pr(其实|就和&随机森林&原理&一样)Pr(就和&随机森林&原理&一样)\n",
    "-> Pr(其实|就和)Pr(就和|随机森林&原理&一样)Pr(随机森林&原理&一样)\n",
    "-> Pr(其实|就和)Pr(就和|随机森林)Pr(随机森林&原理&一样)\n",
    "-> Pr(其实|就和)Pr(就和|随机森林)Pr(随机森林|原理&一样)Pr(原理&一样)\n",
    "-> Pr(其实|就和)Pr(就和|随机森林)Pr(随机森林|原理)Pr(原理&一样)\n",
    "-> Pr(其实|就和)Pr(就和|随机森林)Pr(随机森林|原理)Pr(原理|一样)Pr(一样)\n",
    "```\n",
    "\n",
    "$$ Pr(其实|就和) = \\frac{\\#其实就和}{\\#就和}$$ （#：表示数量）\n",
    "\n",
    "$$ Pr(W_1|W_2) = \\frac{\\#W_1W_2}{\\#W_2}$$ \n",
    "\n",
    "进一步抽象\n",
    "$$ Pr(sentence) = Pr(w_1w_2w_3w_4) = \\prod_i^{n} \\frac{\\#W_iW_{i+1}}{\\# W_{i+1}} * Pr(W_n)$$ "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "97"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import random\n",
    "random.choice(range(100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "filename = \"data/sqlResult_1558435.csv\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 数据日后会经常用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "content = pd.read_csv(filename, encoding='gb18030')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>author</th>\n",
       "      <th>source</th>\n",
       "      <th>content</th>\n",
       "      <th>feature</th>\n",
       "      <th>title</th>\n",
       "      <th>url</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>89617</td>\n",
       "      <td>NaN</td>\n",
       "      <td>快科技@http://www.kkj.cn/</td>\n",
       "      <td>此外，自本周（6月12日）起，除小米手机6等15款机型外，其余机型已暂停更新发布（含开发版/...</td>\n",
       "      <td>{\"type\":\"科技\",\"site\":\"cnbeta\",\"commentNum\":\"37\"...</td>\n",
       "      <td>小米MIUI 9首批机型曝光：共计15款</td>\n",
       "      <td>http://www.cnbeta.com/articles/tech/623597.htm</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>89616</td>\n",
       "      <td>NaN</td>\n",
       "      <td>快科技@http://www.kkj.cn/</td>\n",
       "      <td>骁龙835作为唯一通过Windows 10桌面平台认证的ARM处理器，高通强调，不会因为只考...</td>\n",
       "      <td>{\"type\":\"科技\",\"site\":\"cnbeta\",\"commentNum\":\"15\"...</td>\n",
       "      <td>骁龙835在Windows 10上的性能表现有望改善</td>\n",
       "      <td>http://www.cnbeta.com/articles/tech/623599.htm</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>89615</td>\n",
       "      <td>NaN</td>\n",
       "      <td>快科技@http://www.kkj.cn/</td>\n",
       "      <td>此前的一加3T搭载的是3400mAh电池，DashCharge快充规格为5V/4A。\\r\\n...</td>\n",
       "      <td>{\"type\":\"科技\",\"site\":\"cnbeta\",\"commentNum\":\"18\"...</td>\n",
       "      <td>一加手机5细节曝光：3300mAh、充半小时用1天</td>\n",
       "      <td>http://www.cnbeta.com/articles/tech/623601.htm</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>89614</td>\n",
       "      <td>NaN</td>\n",
       "      <td>新华社</td>\n",
       "      <td>这是6月18日在葡萄牙中部大佩德罗冈地区拍摄的被森林大火烧毁的汽车。新华社记者张立云摄\\r\\n</td>\n",
       "      <td>{\"type\":\"国际新闻\",\"site\":\"环球\",\"commentNum\":\"0\",\"j...</td>\n",
       "      <td>葡森林火灾造成至少62人死亡 政府宣布进入紧急状态（组图）</td>\n",
       "      <td>http://world.huanqiu.com/hot/2017-06/10866126....</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>89613</td>\n",
       "      <td>胡淑丽_MN7479</td>\n",
       "      <td>深圳大件事</td>\n",
       "      <td>（原标题：44岁女子跑深圳约会网友被拒，暴雨中裸身奔走……）\\r\\n@深圳交警微博称：昨日清...</td>\n",
       "      <td>{\"type\":\"新闻\",\"site\":\"网易热门\",\"commentNum\":\"978\",...</td>\n",
       "      <td>44岁女子约网友被拒暴雨中裸奔 交警为其披衣相随</td>\n",
       "      <td>http://news.163.com/17/0618/00/CN617P3Q0001875...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      id      author                  source  \\\n",
       "0  89617         NaN  快科技@http://www.kkj.cn/   \n",
       "1  89616         NaN  快科技@http://www.kkj.cn/   \n",
       "2  89615         NaN  快科技@http://www.kkj.cn/   \n",
       "3  89614         NaN                     新华社   \n",
       "4  89613  胡淑丽_MN7479                   深圳大件事   \n",
       "\n",
       "                                             content  \\\n",
       "0  此外，自本周（6月12日）起，除小米手机6等15款机型外，其余机型已暂停更新发布（含开发版/...   \n",
       "1  骁龙835作为唯一通过Windows 10桌面平台认证的ARM处理器，高通强调，不会因为只考...   \n",
       "2  此前的一加3T搭载的是3400mAh电池，DashCharge快充规格为5V/4A。\\r\\n...   \n",
       "3    这是6月18日在葡萄牙中部大佩德罗冈地区拍摄的被森林大火烧毁的汽车。新华社记者张立云摄\\r\\n   \n",
       "4  （原标题：44岁女子跑深圳约会网友被拒，暴雨中裸身奔走……）\\r\\n@深圳交警微博称：昨日清...   \n",
       "\n",
       "                                             feature  \\\n",
       "0  {\"type\":\"科技\",\"site\":\"cnbeta\",\"commentNum\":\"37\"...   \n",
       "1  {\"type\":\"科技\",\"site\":\"cnbeta\",\"commentNum\":\"15\"...   \n",
       "2  {\"type\":\"科技\",\"site\":\"cnbeta\",\"commentNum\":\"18\"...   \n",
       "3  {\"type\":\"国际新闻\",\"site\":\"环球\",\"commentNum\":\"0\",\"j...   \n",
       "4  {\"type\":\"新闻\",\"site\":\"网易热门\",\"commentNum\":\"978\",...   \n",
       "\n",
       "                           title  \\\n",
       "0           小米MIUI 9首批机型曝光：共计15款   \n",
       "1     骁龙835在Windows 10上的性能表现有望改善   \n",
       "2      一加手机5细节曝光：3300mAh、充半小时用1天   \n",
       "3  葡森林火灾造成至少62人死亡 政府宣布进入紧急状态（组图）   \n",
       "4       44岁女子约网友被拒暴雨中裸奔 交警为其披衣相随   \n",
       "\n",
       "                                                 url  \n",
       "0     http://www.cnbeta.com/articles/tech/623597.htm  \n",
       "1     http://www.cnbeta.com/articles/tech/623599.htm  \n",
       "2     http://www.cnbeta.com/articles/tech/623601.htm  \n",
       "3  http://world.huanqiu.com/hot/2017-06/10866126....  \n",
       "4  http://news.163.com/17/0618/00/CN617P3Q0001875...  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "content.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  * 常见编码错误\n",
    "~~~\n",
    "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 0: invalid start byte\n",
    "~~~\n",
    "- chardet声称是可以自动判断编码，但是实际经常会判断错\n",
    "- 解决方法： 搜索🔍 python encoding <https://docs.python.org/2.4/lib/standard-encodings.html>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 数据预处理\n",
    "含有很多不是文字，空格等，使用正则表达式"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "articles = content['content'].tolist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "89611"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(articles)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'此外，自本周（6月12日）起，除小米手机6等15款机型外，其余机型已暂停更新发布（含开发版/体验版内测，稳定版暂不受影响），以确保工程师可以集中全部精力进行系统优化工作。有人猜测这也是将精力主要用到MIUI 9的研发之中。\\r\\nMIUI 8去年5月发布，距今已有一年有余，也是时候更新换代了。\\r\\n当然，关于MIUI 9的确切信息，我们还是等待官方消息。\\r\\n'"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "articles[0] # 存在很多特殊符号,进行预处理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re\n",
    "\n",
    "def token(string):\n",
    "    return re.findall('\\w+', string) #  \\w+： 匹配包括下划线的任何单词字符"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "\\w 匹配词语字符\n",
    "对于 Unicode (str) 样式：\n",
    "匹配Unicode词语的字符，包含了可以构成词语的绝大部分字符，也包括数字和下划线。如果设置了 ASCII 标志，就只匹配 [a-zA-Z0-9_] 。\n",
    "对于8位(bytes)样式：\n",
    "匹配ASCII字符中的数字和字母和下划线，就是 [a-zA-Z0-9_] 。如果设置了 LOCALE 标记，就匹配当前语言区域的数字和字母和下划线。\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['此外',\n",
       " '自本周',\n",
       " '6月12日',\n",
       " '起',\n",
       " '除小米手机6等15款机型外',\n",
       " '其余机型已暂停更新发布',\n",
       " '含开发版',\n",
       " '体验版内测',\n",
       " '稳定版暂不受影响',\n",
       " '以确保工程师可以集中全部精力进行系统优化工作',\n",
       " '有人猜测这也是将精力主要用到MIUI',\n",
       " '9的研发之中',\n",
       " 'MIUI',\n",
       " '8去年5月发布',\n",
       " '距今已有一年有余',\n",
       " '也是时候更新换代了',\n",
       " '当然',\n",
       " '关于MIUI',\n",
       " '9的确切信息',\n",
       " '我们还是等待官方消息']"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "token(articles[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: jieba in /Users/stone/anaconda3/lib/python3.7/site-packages (0.40)\r\n"
     ]
    }
   ],
   "source": [
    "!pip install jieba"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### * pip配置文件出现问题\n",
    "解决： 使用本地编辑器打开，重新编辑文本，"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['结巴',\n",
       " '分词',\n",
       " '好好',\n",
       " '啊',\n",
       " '后',\n",
       " ' ',\n",
       " 'i',\n",
       " ' ',\n",
       " '啊',\n",
       " '手',\n",
       " ' ',\n",
       " 'i',\n",
       " '后',\n",
       " '打',\n",
       " '哦',\n",
       " '电话',\n",
       " '撒']"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import jieba\n",
    "list(jieba.cut('结巴分词好好啊后 i 啊手 i后打哦电话撒'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'在外国名著麦田里的守望者中作者想要守护麦田里如自己内心一般纯真的孩子们而驻村干部们也在这个炎热的夏天里撸袖子上阵真正做起了村民们的麦田守望者三夏时节不等人你看到了吗不停翻涌起伏仿若铺陈至天边的金黄麦浪中那若隐若现的人影是自治区新闻出版广电局驻和田市肖尔巴格乡合尼村工作队的队员与工作队组织的青年志愿者在这个炎热的夏季他们深入田间地头帮助村民们收割小麦扛起收麦机麦田中的每个人都显得兴致勃勃一天下来就近22亩小麦收割完毕志愿者麦麦提亚森擦去满脸的汗水高兴地告诉驻村队员我们青年志愿者应该多做贡献为村里的脱贫致富出把力工作队带着我们为村里的老人服务看到那些像我爷爷奶奶一样的老人赞许感谢的目光我体会到了帮助他人的快乐自治区新闻出版广电局驻村工作队孙敏艾力依布拉音麦收时节我们在一起6月中旬的和田墨玉麦田金黄静待收割6月14日15日两天自治区高级人民法院驻和田地区墨玉县吐外特乡罕勒克艾日克村工作队与48名村民志愿者一道帮助村里29户有需要的村民进行小麦收割工作田间地头罕勒克艾日克村志愿队的红旗迎风飘扬格外醒目10余台割麦机一起轰鸣男人们在用机器收割小麦的同时几名妇女也加入到志愿队构成了一道美丽的麦收风景休息空闲工作队员和村民们坐在树荫下田埂上互相问好聊天语言交流有困难就用手势动作比划着聊天有趣地交流方式不时引来阵阵欢笑大家在一同享受丰收和喜悦也一同增进着彼此的情感和友谊自治区高级人民法院驻村工作队周春梅艾地艾木阿不拉细看稻菽千重浪6月15日自治区煤田灭火工程局的干部职工们再一次跋涉1000多公里来到了叶城县萨依巴格乡阿亚格欧尔达贝格村见到了自己的亲戚现场处处都透出掩盖不住的喜悦一声声亲切的谢谢一个个结实的拥抱都透露出浓浓的亲情没坐一会儿在嘘寒问暖中大家了解到在麦收的关键时刻部分村民家中却存在收割难的问题小麦成熟期短收获的时间集中天气的变化对小麦最终产量的影响极大如果不能及时收割会有不小损失的于是大家几乎立刻就决定要帮助亲戚们收割麦子在茂密的麦地里干部们每人手持一把镰刀一字排开挽起衣袖卷起裤腿挥舞着镰刀进行着无声的竞赛骄阳似火汗如雨下但这都挡不住大家的热情随着此起彼伏的镰刀割倒麦子的刷刷声响不一会一束束沉甸甸的麦穗就被整齐地堆放了起来当看到自己亲手收割的金黄色麦穗被一簇簇地打成捆运送到晒场每个人的脸上都露出了灿烂的笑容自治区煤田灭火工程局驻村工作队马浩南这是一个收获多多的季节6月13日清晨6时许和田地区民丰县若雅乡特开墩村的麦田里已经传来马达轰鸣声原来是自治区质监局驻村工作队趁着天气尚且凉爽开始了麦田的收割工作忙碌间隙志愿者队伍搬来清凉的水村民们拎来鲜甜的西瓜抹一把汗水吃一牙西瓜甜蜜的汁水似乎流进了每一个人的心里说起割麦子对于生活在这片土地上的村民来说是再平常不过的事但是对于工作队队员们来说却是陌生的自治区质监局驻民丰县若克雅乡博斯坦村工作队队员们一开始觉得十几个人一起收割二亩地应该会挺快的结果却一点不简单镰刀拿到自己手里割起来考验才真正的开始大家弓着腰弯着腿亦步亦趋手上挥舞着镰刀时刻注意不要让镰刀割到自己脚下还要留心不要把套种的玉米苗踩伤不一会儿就已经汗流浃背了抬头看看身边的村民早就远远地割到前面去了只有今年已经56岁的工作队队长李树刚有割麦经验多少给队员们挽回了些面子赶不上村民们割麦子的速度更不要说搞定收割机这台大家伙了现代化的机械收割能成倍提升小麦的收割速度李树刚说不过能有这样的体验拉近和村民的距离也是很难得的体验自治区质监局驻村工作队王辉马君刚我们是麦田的守护者为了应对麦收新疆银监局驻和田县塔瓦库勒乡也先巴扎村工作队一早就从经济支援和人力支援两方面做好了准备一方面工作队帮村里购入了5台小麦收割机另一边还组织村干部青年团员等组成了6支近百人的收割先锋突击队帮助村民们抢收麦子看着及时归仓的麦子村民们喜得合不拢嘴纷纷摘下自家杏树上的杏子送给工作队金黄的麦穗温暖了村民们的心香甜的杏子温暖了工作队员的心麦子加杏子拉近了村民和队员们的心新疆银监局驻村工作队王继发免责声明本文仅代表作者个人观点与环球网无关其原创性以及文中陈述文字和内容未经本站证实对本文以及其中全部或者部分内容文字的真实性完整性及时性本站不作任何保证或承诺请读者仅作参考并请自行核实相关内容'"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "''.join(token(articles[110]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "articles_clean = [''.join(token(str(a)))for a in articles]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "89611"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(articles_clean)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "！把重要的信息保存起来"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('article_9k.txt', 'w') as f:\n",
    "    for a in articles_clean:\n",
    "        f.write(a + '\\n')    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "article_9k.txt                         assignment1_3-编程实战.ipynb\r\n",
      "assignment1_1-课堂内容复现.ipynb\r\n"
     ]
    }
   ],
   "source": [
    "!ls"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "def cut(string): return list(jieba.cut(string))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "Token = []"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "Token = cut(open('article_9k.txt').read())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'list'>\n",
      "此外\n",
      "自\n",
      "本周\n",
      "6\n",
      "月\n",
      "12\n",
      "日起\n",
      "除\n",
      "小米\n",
      "手机\n"
     ]
    }
   ],
   "source": [
    "print(type(Token))\n",
    "for i in range(10):\n",
    "    print(Token[i])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "本周\n"
     ]
    }
   ],
   "source": [
    "# 将token保存到dict在存储起来\n",
    "line_dict = {}\n",
    "with open('article_9k_cut.txt', 'w') as f:\n",
    "    for i, line in enumerate(Token):\n",
    "        line_dict[i] = line\n",
    "    f.write(str(line_dict))\n",
    "print(line_dict[2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n",
      "100\n",
      "200\n",
      "300\n",
      "400\n",
      "500\n",
      "600\n",
      "700\n",
      "800\n",
      "900\n",
      "1000\n",
      "1100\n",
      "1200\n",
      "1300\n",
      "1400\n",
      "1500\n",
      "1600\n",
      "1700\n",
      "1800\n",
      "1900\n",
      "2000\n",
      "2100\n",
      "2200\n",
      "2300\n",
      "2400\n",
      "2500\n",
      "2600\n",
      "2700\n",
      "2800\n",
      "2900\n",
      "3000\n",
      "3100\n",
      "3200\n",
      "3300\n",
      "3400\n",
      "3500\n",
      "3600\n",
      "3700\n",
      "3800\n",
      "3900\n",
      "4000\n",
      "4100\n",
      "4200\n",
      "4300\n",
      "4400\n",
      "4500\n",
      "4600\n",
      "4700\n",
      "4800\n",
      "4900\n",
      "5000\n",
      "5100\n",
      "5200\n",
      "5300\n",
      "5400\n",
      "5500\n",
      "5600\n",
      "5700\n",
      "5800\n",
      "5900\n",
      "6000\n",
      "6100\n",
      "6200\n",
      "6300\n",
      "6400\n",
      "6500\n",
      "6600\n",
      "6700\n",
      "6800\n",
      "6900\n",
      "7000\n",
      "7100\n",
      "7200\n",
      "7300\n",
      "7400\n",
      "7500\n",
      "7600\n",
      "7700\n",
      "7800\n",
      "7900\n",
      "8000\n",
      "8100\n",
      "8200\n",
      "8300\n",
      "8400\n",
      "8500\n",
      "8600\n",
      "8700\n",
      "8800\n",
      "8900\n",
      "9000\n",
      "9100\n",
      "9200\n",
      "9300\n",
      "9400\n",
      "9500\n",
      "9600\n",
      "9700\n",
      "9800\n",
      "9900\n",
      "10000\n"
     ]
    }
   ],
   "source": [
    "for i, line in enumerate((open('article_9k.txt'))):\n",
    "    if i % 100 == 0:\n",
    "        print(i)\n",
    "    if i > 10000:\n",
    "        break\n",
    "    Token += cut(line)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with_jieba_cut = Counter(jieba.cut(articles[110]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from functools import reduce"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections import Counter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### operator: python内置运算符"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [],
   "source": [
    "from operator import add, mul"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### functools:\n",
    "Python 高阶函数相关的函数\n",
    "```\n",
    "这个 functools.reduce 就是 Python 2 内建库中的 reduce，它之所以出现在这里就是因为 Guido 的独裁，他并不喜欢函数式编程中的“map-reduce”概念，因此打算将 map 和 reduce 两个函数移出内建函数库，最后在社区的强烈反对中将 map 函数保留在了内建库中， 但是 Python 3 内建的 map 函数返回的是一个迭代器对象，而 Python 2 中会 eagerly 生成一个 list，使用时要多加注意。\n",
    "该函数的作用是将一个序列归纳为一个输出，其原型如下：\n",
    "reduce(function, sequence, startValue)\n",
    "```\n",
    "复制代码使用示例：\n",
    "~~~\n",
    ">>> def foo(x, y):\n",
    "...     return x + y\n",
    "...\n",
    ">>> l = range(1, 10)\n",
    ">>> reduce(foo, l)\n",
    "45\n",
    ">>> reduce(foo, l, 10)\n",
    "55\n",
    "~~~"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [],
   "source": [
    "from functools import reduce"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [],
   "source": [
    "words_count = Counter(Token)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('的', 703716),\n",
       " ('n', 382020),\n",
       " ('在', 263597),\n",
       " ('月', 189330),\n",
       " ('日', 166300),\n",
       " ('新华社', 142462),\n",
       " ('和', 134061),\n",
       " ('年', 123106),\n",
       " ('了', 121938),\n",
       " ('是', 100909),\n",
       " ('\\n', 89611),\n",
       " ('１', 88187),\n",
       " ('０', 84945),\n",
       " ('外代', 83268),\n",
       " ('中', 73926),\n",
       " ('中国', 71179),\n",
       " ('２', 70521),\n",
       " ('2017', 69894),\n",
       " ('记者', 62147),\n",
       " ('二线', 61998),\n",
       " ('将', 61420),\n",
       " ('与', 58309),\n",
       " ('等', 58162),\n",
       " ('为', 57019),\n",
       " ('5', 54578),\n",
       " ('照片', 52271),\n",
       " ('4', 51626),\n",
       " ('对', 50317),\n",
       " ('上', 47452),\n",
       " ('也', 47401),\n",
       " ('有', 45767),\n",
       " ('５', 40857),\n",
       " ('说', 39017),\n",
       " ('发展', 37632),\n",
       " ('他', 37194),\n",
       " ('３', 36906),\n",
       " ('以', 36867),\n",
       " ('国际', 35842),\n",
       " ('nn', 35330),\n",
       " ('４', 34659),\n",
       " ('比赛', 32232),\n",
       " ('６', 30575),\n",
       " ('到', 30109),\n",
       " ('人', 29572),\n",
       " ('从', 29489),\n",
       " ('6', 29002),\n",
       " ('都', 28027),\n",
       " ('不', 27963),\n",
       " ('后', 27393),\n",
       " ('当日', 27186),\n",
       " ('就', 26684),\n",
       " ('并', 26568),\n",
       " ('国家', 26439),\n",
       " ('７', 26386),\n",
       " ('企业', 26147),\n",
       " ('进行', 25987),\n",
       " ('3', 25491),\n",
       " ('美国', 25485),\n",
       " ('举行', 25389),\n",
       " ('被', 25277),\n",
       " ('北京', 25245),\n",
       " ('体育', 24873),\n",
       " ('2', 24376),\n",
       " ('1', 24182),\n",
       " ('这', 24118),\n",
       " ('新', 23828),\n",
       " ('但', 23385),\n",
       " ('比', 23229),\n",
       " ('个', 23081),\n",
       " ('足球', 22554),\n",
       " ('表示', 22134),\n",
       " ('经济', 22006),\n",
       " ('我', 21940),\n",
       " ('一个', 21932),\n",
       " ('９', 21920),\n",
       " ('还', 21861),\n",
       " ('合作', 21567),\n",
       " ('要', 21045),\n",
       " ('n5', 20946),\n",
       " ('已', 20882),\n",
       " ('摄', 20837),\n",
       " ('８', 20701),\n",
       " ('工作', 20700),\n",
       " ('n4', 20658),\n",
       " ('选手', 19986),\n",
       " ('我们', 19982),\n",
       " ('市场', 19001),\n",
       " ('一路', 18978),\n",
       " ('一带', 18907),\n",
       " ('建设', 18634),\n",
       " ('让', 18609),\n",
       " ('日电', 18384),\n",
       " ('通过', 18159),\n",
       " ('多', 17760),\n",
       " ('时', 17750),\n",
       " ('完', 17424),\n",
       " ('于', 17421),\n",
       " ('问题', 17338),\n",
       " ('更', 17275),\n",
       " ('项目', 17260)]"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "words_count.most_common(100) # 出现频率最高的100个单词"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "17618254"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(Token)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "21"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reduce(add, [1, 2, 3, 4, 5, 6])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 2, 3, 3, 43, 5]"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[1, 2, 3] + [3, 43, 5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [],
   "source": [
    "frequiences = [f for w, f in words_count.most_common(100)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[703716,\n",
       " 382020,\n",
       " 263597,\n",
       " 189330,\n",
       " 166300,\n",
       " 142462,\n",
       " 134061,\n",
       " 123106,\n",
       " 121938,\n",
       " 100909,\n",
       " 89611,\n",
       " 88187,\n",
       " 84945,\n",
       " 83268,\n",
       " 73926,\n",
       " 71179,\n",
       " 70521,\n",
       " 69894,\n",
       " 62147,\n",
       " 61998,\n",
       " 61420,\n",
       " 58309,\n",
       " 58162,\n",
       " 57019,\n",
       " 54578,\n",
       " 52271,\n",
       " 51626,\n",
       " 50317,\n",
       " 47452,\n",
       " 47401,\n",
       " 45767,\n",
       " 40857,\n",
       " 39017,\n",
       " 37632,\n",
       " 37194,\n",
       " 36906,\n",
       " 36867,\n",
       " 35842,\n",
       " 35330,\n",
       " 34659,\n",
       " 32232,\n",
       " 30575,\n",
       " 30109,\n",
       " 29572,\n",
       " 29489,\n",
       " 29002,\n",
       " 28027,\n",
       " 27963,\n",
       " 27393,\n",
       " 27186,\n",
       " 26684,\n",
       " 26568,\n",
       " 26439,\n",
       " 26386,\n",
       " 26147,\n",
       " 25987,\n",
       " 25491,\n",
       " 25485,\n",
       " 25389,\n",
       " 25277,\n",
       " 25245,\n",
       " 24873,\n",
       " 24376,\n",
       " 24182,\n",
       " 24118,\n",
       " 23828,\n",
       " 23385,\n",
       " 23229,\n",
       " 23081,\n",
       " 22554,\n",
       " 22134,\n",
       " 22006,\n",
       " 21940,\n",
       " 21932,\n",
       " 21920,\n",
       " 21861,\n",
       " 21567,\n",
       " 21045,\n",
       " 20946,\n",
       " 20882,\n",
       " 20837,\n",
       " 20701,\n",
       " 20700,\n",
       " 20658,\n",
       " 19986,\n",
       " 19982,\n",
       " 19001,\n",
       " 18978,\n",
       " 18907,\n",
       " 18634,\n",
       " 18609,\n",
       " 18384,\n",
       " 18159,\n",
       " 17760,\n",
       " 17750,\n",
       " 17424,\n",
       " 17421,\n",
       " 17338,\n",
       " 17275,\n",
       " 17260]"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frequiences"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = [i for i in range(100)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0,\n",
       " 1,\n",
       " 2,\n",
       " 3,\n",
       " 4,\n",
       " 5,\n",
       " 6,\n",
       " 7,\n",
       " 8,\n",
       " 9,\n",
       " 10,\n",
       " 11,\n",
       " 12,\n",
       " 13,\n",
       " 14,\n",
       " 15,\n",
       " 16,\n",
       " 17,\n",
       " 18,\n",
       " 19,\n",
       " 20,\n",
       " 21,\n",
       " 22,\n",
       " 23,\n",
       " 24,\n",
       " 25,\n",
       " 26,\n",
       " 27,\n",
       " 28,\n",
       " 29,\n",
       " 30,\n",
       " 31,\n",
       " 32,\n",
       " 33,\n",
       " 34,\n",
       " 35,\n",
       " 36,\n",
       " 37,\n",
       " 38,\n",
       " 39,\n",
       " 40,\n",
       " 41,\n",
       " 42,\n",
       " 43,\n",
       " 44,\n",
       " 45,\n",
       " 46,\n",
       " 47,\n",
       " 48,\n",
       " 49,\n",
       " 50,\n",
       " 51,\n",
       " 52,\n",
       " 53,\n",
       " 54,\n",
       " 55,\n",
       " 56,\n",
       " 57,\n",
       " 58,\n",
       " 59,\n",
       " 60,\n",
       " 61,\n",
       " 62,\n",
       " 63,\n",
       " 64,\n",
       " 65,\n",
       " 66,\n",
       " 67,\n",
       " 68,\n",
       " 69,\n",
       " 70,\n",
       " 71,\n",
       " 72,\n",
       " 73,\n",
       " 74,\n",
       " 75,\n",
       " 76,\n",
       " 77,\n",
       " 78,\n",
       " 79,\n",
       " 80,\n",
       " 81,\n",
       " 82,\n",
       " 83,\n",
       " 84,\n",
       " 85,\n",
       " 86,\n",
       " 87,\n",
       " 88,\n",
       " 89,\n",
       " 90,\n",
       " 91,\n",
       " 92,\n",
       " 93,\n",
       " 94,\n",
       " 95,\n",
       " 96,\n",
       " 97,\n",
       " 98,\n",
       " 99]"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### %matplotlib inline 可以在Ipython编译器里直接使用，功能是可以内嵌绘图，并且可以省略掉plt.show()这一步"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x138c7aa90>]"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY0AAAD8CAYAAACLrvgBAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3XmUnFd55/HvU2tvai3dLVmWZLeM5R28dWRhCIsdZNlwkDODEzMwEsQ5yoCZgZCcxGT+cAIhB5I5IfgM0cSxBXICGI8DsfAINBphglls3N7kPWpLttVIuLvd2lqtXqrrmT/eW1KpVFvL3V1S9+9zqFNVt+773lsuoZ+e9771lrk7IiIi1YjVegIiInL6UGiIiEjVFBoiIlI1hYaIiFRNoSEiIlVTaIiISNUUGiIiUjWFhoiIVE2hISIiVUvUegITrbW11dvb22s9DRGR08rjjz/e5+5tlfpNu9Bob2+ns7Oz1tMQETmtmNmr1fTT4SkREamaQkNERKqm0BARkaopNEREpGoKDRERqZpCQ0REqlYxNMzsfDN7Ku920Mw+Y2bzzGyrme0I93NDfzOzO8ysy8y2m9kVeftaG/rvMLO1ee1XmtkzYZs7zMxCe9ExRESkNiqGhru/5O6XuftlwJXAIPA94DZgm7svA7aF5wDXA8vCbR2wHqIAAG4HrgKWA7fnhcD60De33arQXmqMCfe9J7v550eqOk1ZRGTGGu/hqWuBl939VWA1sDG0bwRuDI9XA/d45BFgjpktBK4Dtrp7v7vvA7YCq8Jrze7+C49+sPyegn0VG2PCPfj0Xr79y9cma/ciItPCeEPjZuDb4fECd98LEO7nh/ZFwO68bbpDW7n27iLt5cY4jpmtM7NOM+vs7e0d51uKpJMxhjPZk9pWRGSmqDo0zCwFfBD435W6Fmnzk2ivmrvf6e4d7t7R1lbx0ilF1SXiDGfGTmpbEZGZYjyVxvXAE+7+enj+eji0RLjvCe3dwJK87RYDeyq0Ly7SXm6MCZdOxhgaVaUhIlLOeELjwxw7NAWwCcidAbUWeCCvfU04i2oFcCAcWtoCrDSzuWEBfCWwJbx2yMxWhLOm1hTsq9gYEy6diDM8qkpDRKScqq5ya2YNwPuAP8hr/hJwn5ndArwG3BTaNwM3AF1EZ1p9HMDd+83sC8Bjod/n3b0/PP4E8A2gHvhBuJUbY8KlkzGGtKYhIlJWVaHh7oNAS0HbG0RnUxX2deDWEvvZAGwo0t4JXFKkvegYk6EuEWckk8XdCV8TERGRAvpGeJBORv8pdAaViEhpCo0gnYgDMKzFcBGRkhQaQV2oNIZ02q2ISEkKjUCVhohIZQqNQJWGiEhlCo1AlYaISGUKjUCVhohIZQqNQJWGiEhlCo3gaKWhS4mIiJSk0AiOVhr6cp+ISEkKjUCVhohIZQqNQJWGiEhlCo1AlYaISGUKjUCVhohIZQqNIJ1QpSEiUolCI4jFjFQ8pkpDRKQMhUae6HfCVWmIiJSi0MiTTsRVaYiIlKHQyFOXjDGsSkNEpKSqQsPM5pjZ/Wb2opm9YGZvN7N5ZrbVzHaE+7mhr5nZHWbWZWbbzeyKvP2sDf13mNnavPYrzeyZsM0dFn6ku9QYkyWd0JqGiEg51VYaXwV+6O4XAJcCLwC3AdvcfRmwLTwHuB5YFm7rgPUQBQBwO3AVsBy4PS8E1oe+ue1WhfZSY0yKumRcaxoiImVUDA0zawbeBdwN4O4j7r4fWA1sDN02AjeGx6uBezzyCDDHzBYC1wFb3b3f3fcBW4FV4bVmd/+FuztwT8G+io0xKVRpiIiUV02lcQ7QC3zdzJ40s7vMrBFY4O57AcL9/NB/EbA7b/vu0FauvbtIO2XGmBSqNEREyqsmNBLAFcB6d78cOEz5w0RWpM1Por1qZrbOzDrNrLO3t3c8mx5HlYaISHnVhEY30O3uj4bn9xOFyOvh0BLhviev/5K87RcDeyq0Ly7STpkxjuPud7p7h7t3tLW1VfGWilOlISJSXsXQcPdfA7vN7PzQdC3wPLAJyJ0BtRZ4IDzeBKwJZ1GtAA6EQ0tbgJVmNjcsgK8EtoTXDpnZinDW1JqCfRUbY1Ko0hARKS9RZb//CnzTzFLATuDjRIFzn5ndArwG3BT6bgZuALqAwdAXd+83sy8Aj4V+n3f3/vD4E8A3gHrgB+EG8KUSY0wKVRoiIuVVFRru/hTQUeSla4v0deDWEvvZAGwo0t4JXFKk/Y1iY0wWVRoiIuXpG+F5VGmIiJSn0MiTqzSiYklERAopNPKkk/ohJhGRchQaeXI/xKTQEBEpTqGRpy5XaWhdQ0SkKIVGHlUaIiLlKTTy5CoNnUElIlKcQiOPKg0RkfIUGnlUaYiIlKfQyKNKQ0SkPIVGHlUaIiLlKTTypJOqNEREylFo5KlLqNIQESlHoZFHlYaISHkKjTyqNEREylNo5FGlISJSnkIjT1qVhohIWQqNPPGYkYybKg0RkRIUGgXqEvr1PhGRUhQaBdJJ/U64iEgpVYWGmb1iZs+Y2VNm1hna5pnZVjPbEe7nhnYzszvMrMvMtpvZFXn7WRv67zCztXntV4b9d4VtrdwYkymtSkNEpKTxVBrvdffL3L0jPL8N2Obuy4Bt4TnA9cCycFsHrIcoAIDbgauA5cDteSGwPvTNbbeqwhiTRpWGiEhpb+bw1GpgY3i8Ebgxr/0ejzwCzDGzhcB1wFZ373f3fcBWYFV4rdndf+HuDtxTsK9iY0yadCKuX+4TESmh2tBw4P+a2eNmti60LXD3vQDhfn5oXwTsztu2O7SVa+8u0l5ujElTp0pDRKSkRJX93uHue8xsPrDVzF4s09eKtPlJtFctBNk6gLPOOms8m54gnYhpTUNEpISqKg133xPue4DvEa1JvB4OLRHue0L3bmBJ3uaLgT0V2hcXaafMGIXzu9PdO9y9o62trZq3VFJdMq5KQ0SkhIqhYWaNZjYr9xhYCTwLbAJyZ0CtBR4IjzcBa8JZVCuAA+HQ0hZgpZnNDQvgK4Et4bVDZrYinDW1pmBfxcaYNKo0RERKq+bw1ALge+Es2ATwLXf/oZk9BtxnZrcArwE3hf6bgRuALmAQ+DiAu/eb2ReAx0K/z7t7f3j8CeAbQD3wg3AD+FKJMSaNKg0RkdIqhoa77wQuLdL+BnBtkXYHbi2xrw3AhiLtncAl1Y4xmVRpiIiUpm+EF1ClISJSmkKjgCoNEZHSFBoFcpVGdJRNRETyKTQKpBMx3GFkTIeoREQKKTQK1CWjH2LSuoaIyIkUGgXSieg/idY1REROpNAokM5VGqOqNERECik0CuQqjeGMKg0RkUIKjQK5NY0hVRoiIidQaBRQpSEiUppCo0Cd1jREREpSaBQ4evaUKg0RkRMoNAqo0hARKU2hUUCVhohIaQqNAqo0RERKU2gU0DfCRURKU2gU0LWnRERKU2gUOPY9DYWGiEghhUaBRDxGImY6PCUiUoRCo4h0IqZKQ0SkiKpDw8ziZvakmT0Yni81s0fNbIeZfcfMUqE9HZ53hdfb8/bxudD+kpldl9e+KrR1mdltee1Fx5hsdcm4Kg0RkSLGU2l8Gngh7/mXga+4+zJgH3BLaL8F2Ofu5wJfCf0ws4uAm4GLgVXA34cgigNfA64HLgI+HPqWG2NSqdIQESmuqtAws8XA+4G7wnMDrgHuD102AjeGx6vDc8Lr14b+q4F73X3Y3XcBXcDycOty953uPgLcC6yuMMakUqUhIlJctZXG3wF/AuT++d0C7Hf3THjeDSwKjxcBuwHC6wdC/6PtBduUai83xqRKqdIQESmqYmiY2QeAHnd/PL+5SFev8NpEtReb4zoz6zSzzt7e3mJdxkWVhohIcdVUGu8APmhmrxAdOrqGqPKYY2aJ0GcxsCc87gaWAITXZwP9+e0F25Rq7yszxnHc/U5373D3jra2tireUnla0xARKa5iaLj759x9sbu3Ey1k/8jdPwI8BHwodFsLPBAebwrPCa//yN09tN8czq5aCiwDfgk8BiwLZ0qlwhibwjalxphUdck4w6o0RERO8Ga+p/GnwGfNrIto/eHu0H430BLaPwvcBuDuzwH3Ac8DPwRudfexsGbxKWAL0dlZ94W+5caYVKo0RESKS1Tucoy7/xj4cXi8k+jMp8I+Q8BNJbb/IvDFIu2bgc1F2ouOMdm0piEiUpy+EV5Ec32C/UdGaz0NEZFTjkKjiNamNPsHRxnRISoRkeMoNIpobUoD8Mbh4RrPRETk1KLQKKJtVhQafYdGajwTEZFTi0KjiFyl0TegSkNEJJ9Co4i2EBq9Cg0RkeMoNIponRVdgV2VhojI8RQaRTSkEjSk4lrTEBEpoNAoobUprUpDRKSAQqOE1qaUQkNEpIBCowRVGiIiJ1JolNA6K03fgNY0RETyKTRKaG1Ks29whMyYLiUiIpKj0CihrSmFO/QfVrUhIpKj0CihVV/wExE5gUKjhNz1p3oPKTRERHIUGiUcu/6UDk+JiOQoNEponaWLFoqIFFJolNCYilOXjNGnw1MiIkcpNEowM33BT0SkQMXQMLM6M/ulmT1tZs+Z2V+E9qVm9qiZ7TCz75hZKrSnw/Ou8Hp73r4+F9pfMrPr8tpXhbYuM7str73oGFMlCg2taYiI5FRTaQwD17j7pcBlwCozWwF8GfiKuy8D9gG3hP63APvc/VzgK6EfZnYRcDNwMbAK+Hszi5tZHPgacD1wEfDh0JcyY0wJVRoiIserGBoeGQhPk+HmwDXA/aF9I3BjeLw6PCe8fq2ZWWi/192H3X0X0AUsD7cud9/p7iPAvcDqsE2pMaZE2yxdtFBEJF9VaxqhIngK6AG2Ai8D+909E7p0A4vC40XAboDw+gGgJb+9YJtS7S1lxiic3zoz6zSzzt7e3mreUlVam9L0Hx5hLOsTtk8RkdNZVaHh7mPufhmwmKgyuLBYt3BvJV6bqPZi87vT3TvcvaOtra1Yl5PS2pQmq0uJiIgcNa6zp9x9P/BjYAUwx8wS4aXFwJ7wuBtYAhBenw3057cXbFOqva/MGFPi2Bf8dIhKRASqO3uqzczmhMf1wG8BLwAPAR8K3dYCD4THm8Jzwus/cncP7TeHs6uWAsuAXwKPAcvCmVIposXyTWGbUmNMidYm/Va4iEi+ROUuLAQ2hrOcYsB97v6gmT0P3Gtmfwk8Cdwd+t8N/JOZdRFVGDcDuPtzZnYf8DyQAW519zEAM/sUsAWIAxvc/bmwrz8tMcaU0LfCRUSOVzE03H07cHmR9p1E6xuF7UPATSX29UXgi0XaNwObqx1jquQuWth3SGsaIiKgb4SXNSudIJWI6fLoIiKBQqMMM6OtKa3rT4mIBAqNClqbUqo0REQChUYFuv6UiMgxCo0KdP0pEZFjFBoVLJxTR9/AMEOjY7WeiohIzSk0Klja2og7vNY/WOupiIjUnEKjgvaWRgB29R2u8UxERGpPoVFBLjReUWiIiCg0KpndkGRuQ5JX3tDhKRERhUYV2lsbVWmIiKDQqMrSlkZeeUOhISKi0KhCe2sjew8McWREp92KyMym0KhCe2u0GK7TbkVkplNoVKG9pQHQabciIgqNKuQqDa1riMhMp9CoQnNdkpbGlM6gEpEZT6FRpfbWRh2eEpEZT6FRpfaWRl7VF/xEZIZTaFSpvaWBXx/UabciMrNVDA0zW2JmD5nZC2b2nJl9OrTPM7OtZrYj3M8N7WZmd5hZl5ltN7Mr8va1NvTfYWZr89qvNLNnwjZ3mJmVG6MWtBguIlJdpZEB/sjdLwRWALea2UXAbcA2d18GbAvPAa4HloXbOmA9RAEA3A5cBSwHbs8LgfWhb267VaG91BhTbmmrLlwoIlIxNNx9r7s/ER4fAl4AFgGrgY2h20bgxvB4NXCPRx4B5pjZQuA6YKu797v7PmArsCq81uzuv3B3B+4p2FexMaZcrtLYpUpDRGawca1pmFk7cDnwKLDA3fdCFCzA/NBtEbA7b7Pu0FauvbtIO2XGKJzXOjPrNLPO3t7e8bylqjWlE7Q2pXm1T4vhIjJzVR0aZtYE/AvwGXc/WK5rkTY/ifaqufud7t7h7h1tbW3j2XRc2lsaVGmIyIxWVWiYWZIoML7p7t8Nza+HQ0uE+57Q3g0sydt8MbCnQvviIu3lxqgJXSJdRGa6as6eMuBu4AV3/9u8lzYBuTOg1gIP5LWvCWdRrQAOhENLW4CVZjY3LICvBLaE1w6Z2Yow1pqCfRUboyYuOGMWPYeG6eo5VMtpiIjUTDWVxjuA/wxcY2ZPhdsNwJeA95nZDuB94TnAZmAn0AX8I/BJAHfvB74APBZunw9tAJ8A7grbvAz8ILSXGqMm/sMVi6lLxviHf9tZy2mIiNSMRScsTR8dHR3e2dk5afu//YFn+dYvX+Mnf/JeFs6un7RxRESmkpk97u4dlfrpG+Hj9Pu/eQ5Zh6//7JVaT0VEZMopNMZpybwG3v/WhXzr0dc4cGS01tMREZlSCo2T8AfvPoeB4Qz//MirtZ6KiMiUUmichIvPnM27zmvj6z97haFRXcBQRGYOhcZJ+i/vOoe+gWEeeOpXtZ6KiMiUUWicpLe/pYWLFjZz18O7mG5noImIlKLQOElmxu//5lJ29Azwb/8+Ode7EhE51Sg03oQPvO1MFjSnuevhXbWeiojIlFBovAmpRIy1V7fz064+nt9T7hqOIiLTg0LjTfrI8rNpSMW5+6eqNkRk+lNovEmzG5L8TscSNj39K37+cl+tpyMiMqkUGhPgE+95C2e3NPLRux7lzp+8rLOpRGTaUmhMgAXNdfzrre9g1SVn8FebX+RT33qSIyP60p+ITD8KjQnSlE7wtf90BZ+7/gI2P7uXdf/UqW+Li8i0o9CYQGbGH7z7LXz5P76Nh3f08clvPsFIJlvraYmITBiFxiT4nY4lfPG3L+FHL/Zw67ee4InX9rH3wBEyYwoQETm9JWo9genqI1edzWgmy59//3m2Pv86AOlEjP/10St57wXzazw7EZGTo9CYRB97x1Leff58dvUNsPfAEHc9vIu//D/P867z2ojHrNbTExEZN4XGJFva2sjS1kYA5jak+OQ3n+D7T+/hxssX1XhmIiLjV3FNw8w2mFmPmT2b1zbPzLaa2Y5wPze0m5ndYWZdZrbdzK7I22Zt6L/DzNbmtV9pZs+Ebe4wMys3xuls1cVncMEZs/jqth1a3xCR01I1C+HfAFYVtN0GbHP3ZcC28BzgemBZuK0D1kMUAMDtwFXAcuD2vBBYH/rmtltVYYzTVixm/OH7zmNX32H+9ak9tZ6OiMi4VQwNd/8J0F/QvBrYGB5vBG7Ma7/HI48Ac8xsIXAdsNXd+919H7AVWBVea3b3X3j0Nep7CvZVbIzT2sqLFnDxmc3csW0Ho6o2ROQ0c7Kn3C5w970A4T53OtAiYHdev+7QVq69u0h7uTFOa2bGZ993Hq/1D/KPD++s9XRERMZlor+nUeyUID+J9vENarbOzDrNrLO399T/QaRrLpjPDW89g7/+4Uvc99juyhuIiJwiTjY0Xg+Hlgj3PaG9G1iS128xsKdC++Ii7eXGOIG73+nuHe7e0dbWdpJvaeqYGV/53ct413lt3Pbd7Ty4XesbInJ6ONnQ2ATkzoBaCzyQ174mnEW1AjgQDi1tAVaa2dywAL4S2BJeO2RmK8JZU2sK9lVsjGkhnYjzDx+9ko6z5/GZe5/iv3/vGe56eCdbnvs1vz4wVOvpiYgUVfF7Gmb2beA9QKuZdROdBfUl4D4zuwV4DbgpdN8M3AB0AYPAxwHcvd/MvgA8Fvp93t1zi+ufIDpDqx74QbhRZoxpoz4V566PdfDZ7zzFg9v3cuDI6NHX3rpoNtdeOJ+rlrZw7vwmWptShLORRURqxqbbbz90dHR4Z2dnradxUg4MjvLKG4f52ct9/L/nX+fJ3fvJfTzNdQlueOtC/vyDF1OXjNd2oiIy7ZjZ4+7eUamfvhF+CpndkOTShjlcumQOn3zPufQNDPP8noO83DvAM786wL2P7WZn72HuXHMlcxpStZ6uiMxAqjROI5ue3sMf3/c0i+fV8/WP/QZntzTWekoiMk2o0piGPnjpmSyYlWbdPz3Ou//mx1y4sJm3n9PCpUtmM7s+SXN9kua6BLPqksyqS1CfjGsdREQmlCqN09Du/kE2Pb2Hn7/cR+cr+xgu8UNPrU1p/uq3L2HlxWdM8QxF5HRTbaWh0DjNDY2O0b1vkANHMhwcGuXgkVEODWU4NJThwe17eG7PQda8/Wz+7IYLtYAuIiXp8NQMUZeMc+78WUVf+713tvPXP3yJu3+6ix+92MPiufUkYjEScaMxlaApnWB2Q5Ll7fO4+twWGlL64yAi5anSmAEeeqmHDT/dxfBoljF3RseyHB7OMDCcYd/gKCOZLKlEjOXt82hpShE3Ix4zWprSnNGc5ozZ9Vy4cBZnzWvQGonINKVKQ4567/nzee/5xa/3OJwZ47Fd+3jopR5+/vIb7N43yFjWyYw5bxweZnTs2D8qWpvSXHn2HK44ay6XnzWXty6aTX1Kh7xEZhKFxgyXTsR557JW3rms9YTXslmnf3CEPfuPsL37AI+/uo/OV/vZ8lz0m+fxmLFgVppZdUma6xO0NKZZPLeeRXPrOW/BLK48e67WUUSmGR2eknHrGxjm6d37eWr3fvYeGOLQ0CgHj2ToHRime98gQ6PR2VzpRIzlS+dx0ZnNxM0wg4ZUgqWtjZzT1kh7S6NCReQUocNTMmlam9Jce+ECrr1wwQmvuTt9AyM886v9PLyjj5/u6OORnW/gHl3zfix77B8pjak4Ny8/i99751IWzamfwncgIidLlYZMqcPDGXb1Hebl3gEeerGH72/fiwHXXXwGl581h/PPmMV5C2Yxf1Zai+4iU0jf05DTQve+QTb89BU2Pb2HvoHho+3pRCysjzQwK52gPhWnPhknlYhFt3iMWDjkFTOY3ZCirSnN/OY0zXVJmtIJGtJxmlIJYjGFj0glCg057bwxMMxLrx+iq2eA3f2D7O4/wp4DRzg8nOHIyBiDo2OMZrKMjGWPO6urHDNoSidorktydksDV549lyvOnsvbFs1mXqMuNy+SozUNOe20NKW5uinN1W858UyuQtms40RrKFmH/YMj9BwapvfQMAeHRhkcGePwcIaDQxkOHom+Kf/vPYf42kNd5JZV5jQkeUtbE2fOqachGac+FacxHacxnaAxFV27KxYzYhadKdaQStCQitOUTtDSlKK1Ka2FfJlxFBpyWjp2yCm6n99cx/zmuorbHR7O8PTu/bzw60O83DvAyz0DPPurAwyOZBgcGWNwZOy4xfpKmtIJ4iFYYmbRIbF0kqZ0nGQ8RiIeIxU36pJxGlPRIbNkOLSWC6N4zIibHX1PMTPiMY5unwx9EnEjFY/TkI72lU7EyC+UcofrovlY2I9hRBWXYaSTMeoScdLJWNhelZaMj0JDZpTGdIKrz23l6nOLVzPuznAmy0A4JAaQdWd0zDkyMsbhkei6Xm8MDNM3MEz/4VGy7mTdyWSjPoeGMhwezjCSyXJ4JDqkNjQabTs4PMZoNkvWo2ppzJ1aHSGOqqc4DakoyJLx2NEAM4t+yz4ZN1LxaB2pPhmnLqwtpROxo4GXiEWBl4hFoZar1hpC3/pUnHQiTjwWBVsiFgv7j4KsVG5ZCOKYQSIWzSGdiNGQSlCXVODVikJDJI9ZVBVM5WGnXHhkQ4C4w5g7mbFo/SYz5oxlo9twJnu0KhoaHTu6j/xTmnMhlnucC6WsR1cAGBqNQiwXgoPDY4yOZclknUw2SzZL2AdksllGMtEtd9jvyMix/mNjUViOuZPNRo+nQsygMZUgmYgdrfKOhUwUJrlgitmxSi4WgiqfWVTVNddHa1+N6cTRHom4Ma8xOhQ5rzEVKjc7OgcLoZYMwRpVkcf2G4/FSMbtaCgnYtHjdCIWwvT0Cz+FhkiNxWJGjNPrL45SxrLO4ZGo0jo8HAVbdMseFyzux9akismFYH6ADofwyq1XDQxnojAN+422ccay0X3433EBmi34FQEn2m5kLMuhoQw9Bwc4PJw5+vrImLNvcGRchyzHKxk/9tnHzEgnYqSTcZIxOy5QrCAcDcA4+ifHzNiw9jc4q6Vh0uYKCg0RmUDxmNFcl6S5LlnrqUyYbDYKjn2D0aFI4GhA5aqy0XBG30gmezSIosovy0jGQ2UW9cmMOcOZMY6MjjE0MnZcdZarJoczWTJjxxIu68cCLlc9ZkPwRhOK7lKJ2KT/9zjlQ8PMVgFfBeLAXe7+pRpPSURmkFi44nNLU7rWUzklTH4svQlmFge+BlwPXAR82Mwuqu2sRERmrlM6NIDlQJe773T3EeBeYHWN5yQiMmOd6qGxCNid97w7tImISA2c6qFR7JSSE05jMLN1ZtZpZp29vb1TMC0RkZnpVA+NbmBJ3vPFwJ7CTu5+p7t3uHtHW1vblE1ORGSmOdVD4zFgmZktNbMUcDOwqcZzEhGZsU7pU27dPWNmnwK2EJ1yu8Hdn6vxtEREZqxTOjQA3H0zsLnW8xARkWn4expm1gu8epKbtwJ9Ezid08VMfN8z8T3DzHzfes/VOdvdKy4KT7vQeDPMrLOaHyGZbmbi+56J7xlm5vvWe55Yp/pCuIiInEIUGiIiUjWFxvHurPUEamQmvu+Z+J5hZr5vvecJpDUNERGpmioNERGpmkIjMLNVZvaSmXWZ2W21ns9kMLMlZvaQmb1gZs+Z2adD+zwz22pmO8L93FrPdaKZWdzMnjSzB8PzpWb2aHjP3wlXHJhWzGyOmd1vZi+Gz/zt0/2zNrM/DH+2nzWzb5tZ3XT8rM1sg5n1mNmzeW1FP1uL3BH+bttuZle8mbEVGsyo3+3IAH/k7hcCK4Bbw/u8Ddjm7suAbeH5dPNp4IW8518GvhLe8z7glprManJ9Ffihu18AXEr0/qftZ21mi4D/BnS4+yVEV5G4men5WX8DWFXQVuqzvR5YFm7rgPVvZmCFRmRG/G6Hu+919yfC40NEf4ksInqvG0O3jcCNtZnh5DCzxcD7gbvCcwOuAe4PXabje24G3gXcDeDuI+6+n2n+WRNd5aLUEGY0AAACSUlEQVTezBJAA7CXafhZu/tPgP6C5lKf7WrgHo88Aswxs4UnO7ZCIzLjfrfDzNqBy4FHgQXuvheiYAHm125mk+LvgD8Bcj+63ALsd/dMeD4dP+9zgF7g6+Gw3F1m1sg0/qzd/VfA/wBeIwqLA8DjTP/POqfUZzuhf78pNCJV/W7HdGFmTcC/AJ9x94O1ns9kMrMPAD3u/nh+c5Gu0+3zTgBXAOvd/XLgMNPoUFQx4Rj+amApcCbQSHRoptB0+6wrmdA/7wqNSFW/2zEdmFmSKDC+6e7fDc2v58rVcN9Tq/lNgncAHzSzV4gOO15DVHnMCYcwYHp+3t1At7s/Gp7fTxQi0/mz/i1gl7v3uvso8F3gaqb/Z51T6rOd0L/fFBqRGfG7HeFY/t3AC+7+t3kvbQLWhsdrgQemem6Txd0/5+6L3b2d6HP9kbt/BHgI+FDoNq3eM4C7/xrYbWbnh6ZrgeeZxp810WGpFWbWEP6s597ztP6s85T6bDcBa8JZVCuAA7nDWCdDX+4LzOwGon+B5n6344s1ntKEM7N3Ag8Dz3Ds+P6fEa1r3AecRfR/vJvcvXCR7bRnZu8B/tjdP2Bm5xBVHvOAJ4GPuvtwLec30czsMqLF/xSwE/g40T8Up+1nbWZ/Afwu0ZmCTwK/T3T8flp91mb2beA9RFezfR24HfhXiny2IUD/J9HZVoPAx92986THVmiIiEi1dHhKRESqptAQEZGqKTRERKRqCg0REamaQkNERKqm0BARkaopNEREpGoKDRERqdr/B9XCkNMuG9ntAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.plot(x, frequiences)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 小结\n",
    "- AI的问题中，65%都是在做数据预处理\n",
    "- 养成一个习惯，就是把重要的信息，及时保存起来，存在硬盘里\n",
    "- NLP重要规律， 在一个很大的text, 出现频率第二多的单词，是出现频率第一多的1/2，出现频率第n多的频率, 是出现频率最多的1/n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ Pr(sentence) = Pr(w_1w_2w_3w_4) = \\prod_i^{n} \\frac{\\#W_iW_{i+1}}{\\# W_{i+1}} * Pr(W_n)$$ "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x1391ba588>]"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAD8CAYAAABw1c+bAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3Xl8HOWd5/HPr1ut1mndkmVL8n3iCyMc7pgjxBgCCQQIMBMyMeshCUtmZ2cHssxmdoZJwiRzkAwkwQMMkCVOMgQSYg7bXDEOGCyDMb7vQ5ZtSZYsW5J1P/tHtxlFblmypFZJ3d/366VXdz2qUv3qVfa3q5+qesqcc4iISPzweV2AiIgMLgW/iEicUfCLiMQZBb+ISJxR8IuIxBkFv4hInFHwi4jEGQW/iEicUfCLiMSZBK8LiCQ3N9eNHTvW6zJERIaNdevWVTvn8noz75AM/rFjx1JWVuZ1GSIiw4aZ7evtvOrqERGJMwp+EZE4o+AXEYkzPQa/mT1pZpVmtrFT24NmtsHM1pvZCjMb1c2y7eF51pvZiwNZuIiI9E1vjvifAhZ0afuBc26Wc24OsAz4djfLnnTOzQn/XN+POkVEZID0GPzOuVVATZe2450mUwE9zUVEZJjocx+/mX3HzA4Ad9D9EX+SmZWZ2Roz+3wPf29xeN6yqqqqvpYlIiI96HPwO+cecM4VA88C93QzW4lzrhS4HXjYzCac4e8tcc6VOudK8/J6dQ9C1+X5t9d38Pvt+tAQETmTgbiq5+fATZF+4ZyrCL/uBt4Czh2A9UVkZix5ezdvbq2M1ipERGJCn4LfzCZ1mrwe2BphniwzC4bf5wIXA5v7sr7eyksPUnWiOZqrEBEZ9nocssHMlgLzgVwzKwf+FlhoZlOADmAfcHd43lLgbufcXcA04DEz6yD0AfOQcy66wZ+m4BcR6UmPwe+cuy1C8xPdzFsG3BV+/w4ws1/VnaW89CCbKo73PKOISByLqTt389OTdMQvItKDmAr+vPQg9c1tNLa0eV2KiMiQFXPBD+ioX0TkDBT8IiJxJraCP03BLyLSk9gK/lNH/PUKfhGR7sRU8GenJuIzHfGLiJxJTAW/32fkpgWpPK7gFxHpTkwFP4SHbVBXj4hIt2Iz+NXVIyLSrdgLfo3XIyJyRrEX/OlBquub6ejQQ8FERCKJyeBv63AcO9nqdSkiIkNSTAY/QOWJJo8rEREZmmIu+PPTkwBdyy8i0p2YC36N1yMicmYKfhGRONOr4DezJ82s0sw2dmp70Mw2mNl6M1thZqO6WfZOM9sR/rlzoArvTmqin+SAX8EvItKN3h7xPwUs6NL2A+fcLOfcHGAZ8O2uC5lZNqFn9H4KmAf8rZll9b3cnpkZeelBKhX8IiIR9Sr4nXOrgJoubZ0fbpsKRLpw/rPASudcjXOuFljJ6R8gA05374qIdK/Hh62fiZl9B/gyUAdcHmGW0cCBTtPl4baoyk8PsqOyPtqrEREZlvp1ctc594Bzrhh4FrgnwiwWabFIf8vMFptZmZmVVVVV9acsHfGLiJzBQF3V83Pgpgjt5UBxp+kioCLSH3DOLXHOlTrnSvPy8vpVTF5akLqTrTS3tffr74iIxKI+B7+ZTeo0eT2wNcJsy4GrzSwrfFL36nBbVJ26pLO6viXaqxIRGXZ61cdvZkuB+UCumZUTulJnoZlNATqAfcDd4XlLgbudc3c552rM7EFgbfhP/b1zrua0FQywT4ZtON7E6MzkaK9ORGRY6VXwO+dui9D8RDfzlgF3dZp+EniyT9X1kW7iEhHpXszduQt66LqIyJnEZPDnpumIX0SkOzEZ/AG/j+zURI7ooesiIqeJyeAHKM5KZn9Ng9dliIgMOTEb/BPy0thdpeAXEekqdoM/P41DdU3UN7d5XYqIyJASu8GflwrAHh31i4j8kRgO/jQAdlVpsDYRkc5iNvhLclLwmYJfRKSrmA3+YIKfkuwUneAVEekiZoMfQt09OuIXEfljsR38+Wnsrm6gvSPiIwBEROJSbAd/XiotbR0crD3pdSkiIkNGTAf/eF3ZIyJympgOfl3SKSJyupgO/uzURLJSAuzSlT0iIp+I6eAHXdkjItJVXAT/bgW/iMgnegx+M3vSzCrNbGOnth+Y2VYz22BmL5hZZjfL7jWzj81svZmVDWThvTUhP5Xq+hbqGlu9WL2IyJDTmyP+p4AFXdpWAjOcc7OA7cC3zrD85c65Oc650r6V2D/jc8MneKt11C8iAr0IfufcKqCmS9sK59yp8Y7XAEVRqG1ATMgPB3+lgl9EBAamj/+rwCvd/M4BK8xsnZktHoB1nbXirGQCftOVPSIiYQn9WdjMHgDagGe7meVi51yFmeUDK81sa/gbRKS/tRhYDFBSUtKfsv5Igt/H2JxUduqIX0QE6McRv5ndCVwH3OGcizgYjnOuIvxaCbwAzOvu7znnljjnSp1zpXl5eX0tK6LJBelsP3JiQP+miMhw1afgN7MFwH3A9c65xm7mSTWz9FPvgauBjZHmjbYpI9PZX9NIY4sewygi0pvLOZcC7wJTzKzczBYBjwDphLpv1pvZT8PzjjKzl8OLFgCrzewj4H3gJefcq1HZih5MLkgHYPsRdfeIiPTYx++cuy1C8xPdzFsBLAy/3w3M7ld1A2TqyHDwHz7BnOKItxyIiMSNmL9zF6AkO4WkgI+th9XPLyISF8Hv8xmTC9LZduS416WIiHguLoIfYEpBOtsOq49fRCR+gn9kOtX1zRytb/a6FBERT8VV8ANs0/X8IhLn4i/4dYJXROJc3AR/XlqQrJSAgl9E4l7cBL+ZMWVkurp6RCTuxU3wQ+jKnu2HT9DREXFoIRGRuBBfwT9yBA0t7Rw8dtLrUkREPBNnwa8TvCIicRX8kwtCT+NSP7+IxLO4Cv70pACjM5N1xC8icS2ugh9gTkkmb++ooqm13etSREQ8EXfB/+ULxlDb2MrzHxz0uhQREU/EXfDPG5fNzNEZPLF6ty7rFJG4FHfBb2YsumQcu6oa+P32Kq/LEREZdHEX/AALZxYyckQSj6/e7XUpIiKDrjfP3H3SzCrNbGOnth+Y2VYz22BmL5hZxOcZmtkCM9tmZjvN7P6BLLw/EhN83HnRWP6w8yhbDunhLCISX3pzxP8UsKBL20pghnNuFrAd+FbXhczMDzwKXANMB24zs+n9qnYA3T6vhOSAn8ff3uN1KSIig6rH4HfOrQJqurStcM61hSfXAEURFp0H7HTO7XbOtQC/AG7oZ70DJiMlwI1zR7NsQwXHm1q9LkdEZNAMRB//V4FXIrSPBg50mi4Pt0VkZovNrMzMyqqqBuek682lxTS3dfDShkODsj4RkaGgX8FvZg8AbcCzkX4doa3b6yedc0ucc6XOudK8vLz+lNVrs4symJifxq/XlQ/K+kREhoI+B7+Z3QlcB9zhnIsU6OVAcafpIqCir+uLBjPjprlFlO2rZU91g9fliIgMij4Fv5ktAO4DrnfONXYz21pgkpmNM7NE4EvAi30rM3q+cO5ofAbPf6CjfhGJD725nHMp8C4wxczKzWwR8AiQDqw0s/Vm9tPwvKPM7GWA8Mnfe4DlwBbgV865TVHajj4bmZHEJZPyeP6Dg7qTV0TiQkJPMzjnbovQ/EQ381YACztNvwy83OfqBslNc0fzzV+sZ83uo1w0MdfrckREoiou79zt6rPnjCQ9mMBz6u4RkTig4AeSAn4+f+5ofru+guWbDntdjohIVCn4w+6/ZiqzijL47z//kFUavE1EYpiCPyw1mMBTX5nHxPw0Fv+sjPf31PS8kIjIMKTg7yQjJcAzi+YxOjOZRU+vpeLYSa9LEhEZcAr+LnLTgvzHV+bR3uG479cbiHxvmojI8KXgj6AkJ4VvLZzG2zuqefa9/V6XIyIyoBT83fiTT5VwycRcvvvyFvYf7e7mZBGR4UfB3w0z4x+/OAu/Gfcs/YBfrt3Pun211De39bywiMgQ1uOdu/FsdGYy371xJn/93Abu+/XHACQFfLz5V/MpzEj2uDoRkb5R8Pfgc7NHsXBmIeW1jazbV8tf/uojVm4+wpcvHOt1aSIifaKunl7w+4wxOancOLeI8bmprNx8xOuSRET6TMF/lq6aXsCa3Uc5occ1isgwpeA/S1dNK6C13bFqe7XXpYiI9ImC/yzNLckkKyXAa1vU3SMiw5OC/ywl+H1cPjWfN7ZW0tbe4XU5IiJnTcHfB5+ZVkDdyVbK9tV6XYqIyFnrzaMXnzSzSjPb2KntZjPbZGYdZlZ6hmX3mtnH4cczlg1U0V67dHIeiX4fr+nqHhEZhnpzxP8UsKBL20bgRmBVL5a/3Dk3xznX7QfEcJMWTODCCTms3HJEg7iJyLDTY/A751YBNV3atjjntkWtqmHgqukF7DvayOZDx70uRUTkrES7j98BK8xsnZktjvK6BtV1MwtJSfSzZNVur0sRETkr0Q7+i51zc4FrgG+Y2WXdzWhmi82szMzKqqqG/qMPs1IT+dMLxvC7jyrYU93gdTkiIr0W1eB3zlWEXyuBF4B5Z5h3iXOu1DlXmpeXF82yBsxdl44n4Pfx4zd3el2KiEivRS34zSzVzNJPvQeuJnRSOGbkpQe5bV4JL3x4kAM1GrNfRIaH3lzOuRR4F5hiZuVmtsjMvmBm5cCFwEtmtjw87ygzezm8aAGw2sw+At4HXnLOvRqdzfDOn396PD4zHlu1y+tSRER6pcdhmZ1zt3XzqxcizFsBLAy/3w3M7ld1w0BhRjJfLC3iV2vLmTJyBPMn51GcneJ1WSIi3dJ4/APgG5dPZM2uo/yf34R6ssbnpXLvFZO4Yc4ozMzj6kRE/pgNxRuQSktLXVnZ8LrR1znHrqoGVm2v4oUPD/LxwTqumJrPP3x+BqMy9bQuEYkuM1vX2xtlFfxR0N7hePqdvfxg+Tb8PuPW84u5fvYoZhVl6BuAiESFgn+IOFDTyPde2cJrmytpae9gTE4KP/jibOaNy/a6NBGJMWcT/BqdM4qKs1P48R3nsfZvruL7N82ird3x7d9upKNj6H3Yikj8UPAPgozkALecX8xffXYyWw+f4PWtlV6XJCJxTME/iD43axQl2Sk88sYOjeopIp5R8A+iBL+Pr8+fwEfldby9Q8/sFRFvKPgH2Y1ziyjMSOKRNzS+j4h4Q8E/yBITfNz96Qm8v7eGd3bqqF9EBp+C3wO3nl9MYUYSi54u48nVe2jXVT4iMogU/B5ICvh5/usXccH4bP5+2WZu/uk7vLm1kvLaRl3qKSJRpxu4POSc4zfrD/J3v9vMscZWAJIDfibkpzIhL40JeWksmDGSyQXpHlcqIkOd7twdZuqb29hccZydlfXsqDzBrqoGdlXWc/DYSbJSAqy+7wpSgxpPT0S6dzbBrzQZAtKCCcwbl33aUA7r9tVw00/e5dn39rH4sgkeVScisUZ9/EPYeWOyuWRiLktW7eFkS7vX5YhIjFDwD3H3XjmJ6vpmlr6/3+tSRCRGKPiHuHnjsvnUuGx++vtdNLXqqF9E+q83z9x90swqzWxjp7abzWyTmXWYWbcnE8xsgZltM7OdZnb/QBUdb7555SQqTzTz9Dt7qW1ooam1XWP9iEif9ebk7lPAI8Azndo2AjcCj3W3kJn5gUeBzwDlwFoze9E5t7nP1capCyfkUDomi++9spXvvbIVgKSAjwl5aUzKT+O8sdncMa8En08PeRGRnvXmYeurzGxsl7YtQE9Pk5oH7Aw/dB0z+wVwA6DgP0tmxr9/uZQVmw/T2NLOydZ2qk+0sLOqnvf31PCb9RW8t/so/3zLbIIJfq/LFZEhLpqXc44GDnSaLgc+FcX1xbSs1ERuPb/ktHbnHEtW7eZ7r2ylpqGFx/70PNKTAh5UKCLDRTSDP9LXgW47ps1sMbAYoKTk9ICTyMyMP//0BPLSg/z1cxu47t9WM7kgnZREP1kpiVwxNZ+LJuSQ4Nd5fBEJiWbwlwPFnaaLgIruZnbOLQGWQOjO3SjWFZNunFtETlqQR9/YyYGaRhpb2qk60cxT7+wlJzWRa2cVcu+Vk8hNC3pdqoh4LJrBvxaYZGbjgIPAl4Dbo7i+uPfpyXl8enLeJ9NNre28ta2K322o4BfvH2D5psM8cvtczh+rh72LxLPeXM65FHgXmGJm5Wa2yMy+YGblwIXAS2a2PDzvKDN7GcA51wbcAywHtgC/cs5titaGyOmSAn4WzBjJo7fP5YVvXERSwM+Xlqzhp7/fpVFAReKYBmmLI8ebWrnvuQ28svEwV07N559unk1WaqLXZYnIADibQdp0xi+OjEgK8OM75vJ/PzedVTuquPZHb7NuX43XZYnIINPonHHGzPjKxeOYOyaLe37+Ibc8toZ5Y7OZU5LJnOJMLp2US0qi/lmIxDL9D49Ts4oyWXbvJTzyxk7W7D7Kv6/aTVuHIyM5wB2fKuHOi8ZSMCLJ6zJFJArUxy9A6AqgD/bV8rM1+1i+6TB+n/GXn5nC1+brOQAiw4EexCJnLSng56KJuVw0MZf9Rxt56NUt/OOrW3E4vj5/otflicgAUvDLaUpyUvi32+YS8K/n+69uI8FnegKYSAxR8EtEfp/xzzfPpq3D8d2Xt/La5kqCAR8Bv4/SsVksumScBoQTGaZ0Oad0K8Hv4+Fb5/DVi8fR4RwnmtqoOHaS77+6jQUPv83bO6q8LlFE+kAnd+Wsrdpexbd/u5G9Rxv5b5eO44Frp3tdkkjc0w1cElWXTc7j1b+4jC+eV8Tjq/ew48gJr0sSkbOg4Jc+SQr4eWDhNFICfh5+fYfX5YjIWVDwS59lpSby1UvG8dKGQ2w5dNzrckSklxT80i93XTKe9GACD7+23etSRKSXFPzSLxkpARZdOo7lm46w8WCd1+WISC8o+KXfvnrJOEYkJfDgss00trR5XY6I9EDBL/02IinA31w7nbV7a7jxx++w/2ij1yWJyBko+GVA3HJ+MU/92TwO1TXxuUdW89a2Sq9LEpFuKPhlwFw2OY8X77mYwowkvvIfa/lf//kRtQ0tXpclIl305pm7T5pZpZlt7NSWbWYrzWxH+DWrm2XbzWx9+OfFgSxchqYxOam88PWL+dr8Cbzw4UGu/Jff87N397K7ql7P+RUZInocssHMLgPqgWecczPCbd8HapxzD5nZ/UCWc+6+CMvWO+fSzrYoDdkQG7YePs7/fv5jPth/DID0YALTCkcwNjeFkuwURmUmE0zwk+A30oIJfGpcNgl+fQkV6YuzGbKhV2P1mNlYYFmn4N8GzHfOHTKzQuAt59yUCMsp+ONcR4dj25ETfHywjo/L69hy6Dj7ahqpOtF82rxjclL4+vwJfOHcIhIT9AEgcjYGI/iPOecyO/2+1jl3WnePmbUB64E24CHn3G/OsI7FwGKAkpKS8/bt29eb+mWYamxp43BdE63tjraODvYdbeQnb+3i44N1jMpI4jPTC7hgfA7zxmWTkxb0ulyRIW8oBf8o51yFmY0H3gCudM7t6ml9OuKPT8453tpexVN/2MvavTU0trQDMLckk2tnjWLhzJEUZiR7XKXI0DQYj148YmaFnbp6Il6755yrCL/uNrO3gHOBHoNf4pOZcfmUfC6fkk9rewcfH6xj9Y5qXtl4mAeXbebBZZv59nXT+eol47wuVWRY62tH6ovAneH3dwK/7TqDmWWZWTD8Phe4GNjcx/VJnAn4fcwtyeLeKyfxyjcv5c2/ms9V0/L5h5c284ed1V6XJzKs9eZyzqXAu8AUMys3s0XAQ8BnzGwH8JnwNGZWamaPhxedBpSZ2UfAm4T6+BX80ifjclP54ZfOZWJ+Gvf8/APKa3V3sEhf6QlcMqzsqW7g+kdWU5Kdwq+/dhFJAT33VwT0BC6JYeNyU3n41jlsqjjO3f9vnQaFE+kDBb8MO1dOK+B7N85k1fYq/uTx9zjWqGEhRM6Ggl+GpdvmlfDo7XPZePA4tzz2LpsrjjMUuy1FhqK+Xs4p4rlrZhaSkRJg8TPrWPijt8lLD3LJxFzmT8lj/pR8MpIDXpcoMiTp5K4Me5XHm3hrWxWrd1bzh53VHG1oIcFnXDghh1lFGaQGE0gLJlCUlcyF43NJTtQJYYk9A37n7mBT8EtfdXQ41pcfY8WmI6zYfJi91Q10HhQ0McHHheNzuHZWITfNLcLvM++KFRlACn6RMOcczW0d1De3sfXQCd7cVskbWyvZU93A9MIRPPj5czhvTLbXZYr0m4Jf5Aycc7z08SG+89IWDtU1cdnkPHLTEkkO+MlNC7JwZiFTRqZ7XabIWVHwi/RCY0sbj765kxWbjtDY0k5Tazu1jS10OJg6Mp1rZxYyqSCd4uxkxuSkkhbUtRAydCn4Rfqour6ZZR9V8Jv1Faw/cOyTdp/B+WOzuWbGSK6aXsCojGR8Oj8gQ4iCX2QA1DW2cqC2kf01jWyuOM7yTYfZUVkPgN9nZCYHyEpNJDs1kZzwa2owgaQEH0mJfkZnJjMhL43xeamkJOrbgkSXgl8kSnZW1rN6RxXV9S3UNLZQ29DC0YYWasI/jS1tNLV2nLZcbloiBSOSKMxIYnZRJgtnFTIh76wfTifSLQW/iIecczS1dnCgtpGdlfXsqqynou4kh+qaOHSsiW1HTgCh8wjnjMrADAxICvgZkZxARnKAzOREslITyUoJMCI5QILPCPh9jEgO6MY0iWgwHsQiIt0wM5IT/UwuSGdywelXBx2qO8krHx/mlY2HeHdXNacOvZpa2zne1EZ7x5kPxqYVjuCiCTlcMjGXiyfm6vnEctZ0xC8yhDjnqG9u41hjK8caW6lpbOFEUytt7Y6W9g4O1zWxZvdRyvbV0tLWQUZygIUzR35yCWpeWhAznXSOR+rqEYlxTa3tvLOrmhfXV7Bi85FPnk+ckuinKCuZEUkBUoIJpAcTmDIyndnFmcwuyiAzJdHjyiVa1NUjEuOSAn6umFrAFVMLaGxp4/09New72sjeow0crD1JQ0sbx0+2sv9oAy9vPMSp47vZxZksnDGSa2YUUpKT4u1GiGd6dcRvZk8C1wGVzrkZ4bZs4JfAWGAvcItzrjbCsncCfxOe/Afn3NM9rU9H/CID53hTKxvL6yjbV8uKzYfZePA4AOnBBEZlJjMqM4nCzGQKR4Re543N1ofCMDTgXT1mdhlQDzzTKfi/D9Q45x4ys/uBLOfcfV2WywbKgFLAAeuA8yJ9QHSm4BeJngM1jby25Qh7qxuoqGui4thJDtc1cbThvx5oc+mkXG6fV8KV0wp08niYGPCuHufcKjMb26X5BmB++P3TwFvAfV3m+Syw0jlXEy5sJbAAWNqb9YrIwCvOTuHPLh53WntTazvltSd5+eND/OL9/Xzt2Q/ISA5w1bQCrpkxkgsm5GjYihjRn71Y4Jw7BOCcO2Rm+RHmGQ0c6DRdHm4TkSEmKeBnYn4a9145iW9cPpFV26v43YYKVmw+zK8/KAcgNdFPwYikT+5STgsmkBTwYxYa1sJnhpnh90GCz0dq0E9KYgKZKQHmFGcydeQIDYU9BET74zvSHo7Yt2Rmi4HFACUlJdGsSUR64PcZl0/N5/Kp+bS0dfDOrmq2HT7BkePNHDnRRG1DC8caWzhQ20hzawfOOTocdHR6bW3voLGl/Y/uS0gLJlA6NotbSou5enoBCX51I3mhP8F/xMwKw0f7hUBlhHnK+a/uIIAiQl1Cp3HOLQGWQKiPvx91icgASkzwMX9KPvOnRPpSf2bOhe4/qDzezAf7a1m7t4Y3t1bx9Wc/oDAjiVvPL2ZyQTpZKYnkpCWSnhT6FpGamKBB8KKoP8H/InAn8FD49bcR5lkOfNfMssLTVwPf6sc6RWQYMTOCCX6Ks1Mozk7hhjmjae9wvLm1kqff3cvDr+3odtmkgI/kgJ/kgJ+ctCCFGUmMykwmpdOjMxN8RjA8z9wxWcwpzhyErRr+ehX8ZraU0JF7rpmVA39LKPB/ZWaLgP3AzeF5S4G7nXN3OedqzOxBYG34T/39qRO9IhKf/D7jqukFXDW9gKP1zVSeaKYmPNhdfVMbDc1tnGhuo6k19IyExpZ2qk40s/doA+/uOkpTW+hmNeegrcvwFp+bPYr7r5nK6MxkLzZt2NCduyIybHV+tOYz7+zlsVW7MYMrpuaT4PNhFhrbaPGl42O+60hDNohIXDp47CT/tHwb6w8cwzlHW4ejvPYk18wYyb/eOoekgL/nPzJMacgGEYlLozOT+ddb53wy7ZzjidV7+M7LWzjy72t4/M7zyU7VeEUKfhGJWWbGXZeOZ1RmMn/xy/Vc8N3XSU70E/CHTjqnJ4Wef5CXHqR0TBYXTshlUn5azHcLKfhFJOYtnFlIUVYyv/uogtb20D0GTa0dHG9qpe5kKx/uP8ayDYeA0L0GqUE/Ab+PlEQ/U0eOYHZxJnOKMzhnVEZMdBcp+EUkLswqymRWUfeXex6oaWTN7qNsPFhHc1sHLe0dnGhqo2xvDS9+VAGELh+dWpjOnOJMFpxTyEUTcobltwOd3BUR6UHl8SbWHzjGR+XH+OhAHesPHKO+uY3Rmcl88bwixuel4jPD7zPOGTWCMTmpg16jTu6KiAyg/BFJXH3OSK4+ZyQQGtBu+abD/GdZOT98/fSb0KYVjmDBOSOZkB/6ADCMlKCfnNREslMTyUpJJCXR79nT0nTELyLSD9X1zdSdbP3knoJ3dx1l+abDlO2r5UzxmuAzRiQHSA74SfCHvi3kpgX51Z9f2Kc6dMQvIjJIctOC5KYFP5k+Z1QGd106nqP1oTuSHaG7jOub26hpaKGmoZljja2fnFhubGmnoyN0z0F60uBEsoJfRCQKctKC5HT6QBhKNCaqiEicUfCLiMQZBb+ISJxR8IuIxBkFv4hInFHwi4jEGQW/iEicUfCLiMSZITlkg5lVAfv6uHguUD2A5QwH8bjNEJ/bHY/bDPG53We7zWOcc3m9mXFIBn9/mFlZb8eriBXxuM0Qn9sdj9sM8bnd0dxmdfWIiMQZBb+ISJyJxeBf4nUBHojHbYb43O543GaIz+2O2jbHXB+/iIicWSwe8YuIyBnETPCb2QIz22ZmO83sfq/riRYzKzazN81si5ltMrOZ6O91AAADbUlEQVRvhtuzzWylme0Iv2Z5XetAMzO/mX1oZsvC0+PM7L3wNv/SzBK9rnGgmVmmmT1nZlvD+/zCWN/XZvY/wv+2N5rZUjNLisV9bWZPmlmlmW3s1BZx31rIj8L5tsHM5vZn3TER/GbmBx4FrgGmA7eZ2XRvq4qaNuB/OuemARcA3whv6/3A6865ScDr4elY801gS6fpfwT+NbzNtcAiT6qKrh8CrzrnpgKzCW1/zO5rMxsN3AuUOudmAH7gS8Tmvn4KWNClrbt9ew0wKfyzGPhJf1YcE8EPzAN2Oud2O+dagF8AN3hcU1Q45w455z4Ivz9BKAhGE9rep8OzPQ183psKo8PMioBrgcfD0wZcATwXniUWt3kEcBnwBIBzrsU5d4wY39eEngyYbGYJQApwiBjc1865VUBNl+bu9u0NwDMuZA2QaWaFfV13rAT/aOBAp+nycFtMM7OxwLnAe0CBc+4QhD4cgHzvKouKh4G/BjrC0znAMedcW3g6Fvf5eKAK+I9wF9fjZpZKDO9r59xB4J+A/YQCvw5YR+zv61O627cDmnGxEvwWoS2mL1cyszTg18BfOOeOe11PNJnZdUClc25d5+YIs8baPk8A5gI/cc6dCzQQQ906kYT7tG8AxgGjgFRC3Rxdxdq+7smA/nuPleAvB4o7TRcBFR7VEnVmFiAU+s86554PNx859dUv/FrpVX1RcDFwvZntJdSNdwWhbwCZ4e4AiM19Xg6UO+feC08/R+iDIJb39VXAHudclXOuFXgeuIjY39endLdvBzTjYiX41wKTwmf+EwmdDHrR45qiIty3/QSwxTn3L51+9SJwZ/j9ncBvB7u2aHHOfcs5V+ScG0to377hnLsDeBP4Yni2mNpmAOfcYeCAmU0JN10JbCaG9zWhLp4LzCwl/G/91DbH9L7upLt9+yLw5fDVPRcAdae6hPrEORcTP8BCYDuwC3jA63qiuJ2XEPqKtwFYH/5ZSKjP+3VgR/g12+tao7T984Fl4ffjgfeBncB/AkGv64vC9s4BysL7+zdAVqzva+DvgK3ARuBnQDAW9zWwlNB5jFZCR/SLutu3hLp6Hg3n28eErnrq87p1566ISJyJla4eERHpJQW/iEicUfCLiMQZBb+ISJxR8IuIxBkFv4hInFHwi4jEGQW/iEic+f8KzmeaKMyZfAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.plot(x, np.log(frequiences))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "def prob_1word(word):\n",
    "        return words_count[word] / len(Token)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'words_count' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-8-c93155099a4a>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mprob_1word\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"我们\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m<ipython-input-7-941d83d5f0bb>\u001b[0m in \u001b[0;36mprob_1word\u001b[0;34m(word)\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mprob_1word\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mword\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m         \u001b[0;32mreturn\u001b[0m \u001b[0mwords_count\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mword\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mToken\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m: name 'words_count' is not defined"
     ]
    }
   ],
   "source": [
    "prob_1word(\"我们\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['此外', '自', '本周', '6', '月', '12', '日起', '除', '小米', '手机']"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Token[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [],
   "source": [
    "Token = [str(t) for t in Token]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['此外', '自', '本周', '6', '月', '12', '日起', '除', '小米', '手机']"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Token[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [],
   "source": [
    "Token_2_Gram = [''.join(Token[i:i+2]) for i in range(len(Token[:-2]))]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['此外自', '自本周', '本周6', '6月', '月12', '12日起', '日起除', '除小米', '小米手机', '手机6']"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Token_2_Gram[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [],
   "source": [
    "words_count_2 = Counter(Token_2_Gram)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [],
   "source": [
    "def prob_2words(word1, word2):\n",
    "    if word1 + word2 in words_count_2:\n",
    "        return words_count_2[word1+word2] / len(Token_2_Gram)\n",
    "    else:\n",
    "        return 1 / len(Token_2_Gram)    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3.0536514065072974e-05"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prob_2words('我们','在')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2.8379659911777854e-07"
      ]
     },
     "execution_count": 99,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prob_2words('在','吃饭')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_probablity(sentence):\n",
    "    words = cut(sentence)\n",
    "    sentence_prob = 1 \n",
    "    for i, word in enumerate(words[:-1]):\n",
    "        next_ = words[i+1]\n",
    "        probability = prob_2words(word, next_)\n",
    "        sentence_prob *= probability\n",
    "    sentence_prob *prob_1word(words[-1])\n",
    "    return sentence_prob"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'prob_2words' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-47-c356481c4b79>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mget_probablity\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'小明今天抽奖抽到一台苹果手机'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m<ipython-input-46-996e8cb55a4f>\u001b[0m in \u001b[0;36mget_probablity\u001b[0;34m(sentence)\u001b[0m\n\u001b[1;32m      4\u001b[0m     \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mword\u001b[0m \u001b[0;32min\u001b[0m \u001b[0menumerate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mwords\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m         \u001b[0mnext_\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mwords\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m+\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m         \u001b[0mprobability\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mprob_2words\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mword\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnext_\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      7\u001b[0m         \u001b[0msentence_prob\u001b[0m \u001b[0;34m*=\u001b[0m \u001b[0mprobability\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      8\u001b[0m     \u001b[0msentence_prob\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0mprob_1word\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mwords\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mNameError\u001b[0m: name 'prob_2words' is not defined"
     ]
    }
   ],
   "source": [
    "get_probablity('小明今天抽奖抽到一台苹果手机')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 155,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.8285698188632354e-22"
      ]
     },
     "execution_count": 155,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "get_probablity('洋葱奶昔来一杯')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 156,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3.2216203868326836e-15"
      ]
     },
     "execution_count": 156,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "get_probablity('养乐多绿来一杯')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 157,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\nhost = 寒暄 报数 询问 业务相关 结尾 \\n报数 = 我是 数字 号 ,\\n数字 = 单个数字 | 数字 单个数字 \\n单个数字 = 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 \\n寒暄 = 称谓 打招呼 | 打招呼\\n称谓 = 人称 ,\\n人称 = 先生 | 女士 | 小朋友\\n打招呼 = 你好 | 您好 \\n询问 = 请问你要 | 您需要\\n业务相关 = 玩玩 具体业务\\n玩玩 = null\\n具体业务 = 喝酒 | 打牌 | 打猎 | 赌博\\n结尾 = 吗？\\n'"
      ]
     },
     "execution_count": 157,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "host"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 160,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['称谓', '打招呼']\n",
      "expr ['人称', ',']\n",
      "expr ['小朋友']\n",
      "expr ['你好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['单个数字']\n",
      "expr ['1']\n",
      "expr ['请问你要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['喝酒']\n",
      "expr ['吗？']\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['称谓', '打招呼']\n",
      "expr ['人称', ',']\n",
      "expr ['小朋友']\n",
      "expr ['你好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['单个数字']\n",
      "expr ['2']\n",
      "expr ['请问你要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['赌博']\n",
      "expr ['吗？']\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['称谓', '打招呼']\n",
      "expr ['人称', ',']\n",
      "expr ['女士']\n",
      "expr ['你好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['单个数字']\n",
      "expr ['5']\n",
      "expr ['5']\n",
      "expr ['4']\n",
      "expr ['6']\n",
      "expr ['您需要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['喝酒']\n",
      "expr ['吗？']\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['称谓', '打招呼']\n",
      "expr ['人称', ',']\n",
      "expr ['小朋友']\n",
      "expr ['你好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['单个数字']\n",
      "expr ['8']\n",
      "expr ['请问你要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['打牌']\n",
      "expr ['吗？']\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['打招呼']\n",
      "expr ['您好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['单个数字']\n",
      "expr ['5']\n",
      "expr ['请问你要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['赌博']\n",
      "expr ['吗？']\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['称谓', '打招呼']\n",
      "expr ['人称', ',']\n",
      "expr ['小朋友']\n",
      "expr ['你好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['单个数字']\n",
      "expr ['2']\n",
      "expr ['请问你要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['打猎']\n",
      "expr ['吗？']\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['称谓', '打招呼']\n",
      "expr ['人称', ',']\n",
      "expr ['女士']\n",
      "expr ['您好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['单个数字']\n",
      "expr ['3']\n",
      "expr ['请问你要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['打牌']\n",
      "expr ['吗？']\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['称谓', '打招呼']\n",
      "expr ['人称', ',']\n",
      "expr ['先生']\n",
      "expr ['你好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['单个数字']\n",
      "expr ['6']\n",
      "expr ['您需要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['赌博']\n",
      "expr ['吗？']\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['打招呼']\n",
      "expr ['你好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['数字', '单个数字']\n",
      "expr ['单个数字']\n",
      "expr ['5']\n",
      "expr ['6']\n",
      "expr ['5']\n",
      "expr ['2']\n",
      "expr ['7']\n",
      "expr ['9']\n",
      "expr ['9']\n",
      "expr ['6']\n",
      "expr ['5']\n",
      "expr ['请问你要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['喝酒']\n",
      "expr ['吗？']\n",
      "host  : [['寒暄', '报数', '询问', '业务相关', '结尾']]\n",
      "报数  : [['我是', '数字', '号', ',']]\n",
      "数字  : [['单个数字'], ['数字', '单个数字']]\n",
      "单个数字  : [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]\n",
      "寒暄  : [['称谓', '打招呼'], ['打招呼']]\n",
      "称谓  : [['人称', ',']]\n",
      "人称  : [['先生'], ['女士'], ['小朋友']]\n",
      "打招呼  : [['你好'], ['您好']]\n",
      "询问  : [['请问你要'], ['您需要']]\n",
      "业务相关  : [['玩玩', '具体业务']]\n",
      "玩玩  : [['null']]\n",
      "具体业务  : [['喝酒'], ['打牌'], ['打猎'], ['赌博']]\n",
      "结尾  : [['吗？']]\n",
      "expr ['寒暄', '报数', '询问', '业务相关', '结尾']\n",
      "expr ['打招呼']\n",
      "expr ['你好']\n",
      "expr ['我是', '数字', '号', ',']\n",
      "expr ['单个数字']\n",
      "expr ['8']\n",
      "expr ['请问你要']\n",
      "expr ['玩玩', '具体业务']\n",
      "expr ['null']\n",
      "expr ['喝酒']\n",
      "expr ['吗？']\n",
      "sentence: 小朋友 , 你好 我是 1 号 , 请问你要 null 喝酒 吗？ with Prb: 4.936749540120137e-169\n",
      "sentence: 小朋友 , 你好 我是 2 号 , 请问你要 null 赌博 吗？ with Prb: 4.936749540120137e-169\n",
      "sentence: 女士 , 你好 我是 5 5 4 6 号 , 您需要 null 喝酒 吗？ with Prb: 4.572670321626677e-208\n",
      "sentence: 小朋友 , 你好 我是 8 号 , 请问你要 null 打牌 吗？ with Prb: 4.936749540120137e-169\n",
      "sentence: 您好 我是 5 号 , 请问你要 null 赌博 吗？ with Prb: 4.7565532670112375e-140\n",
      "sentence: 小朋友 , 你好 我是 2 号 , 请问你要 null 打猎 吗？ with Prb: 4.936749540120137e-169\n",
      "sentence: 女士 , 您好 我是 3 号 , 请问你要 null 打牌 吗？ with Prb: 4.936749540120137e-169\n",
      "sentence: 先生 , 你好 我是 6 号 , 您需要 null 赌博 吗？ with Prb: 1.367561280797495e-164\n",
      "sentence: 你好 我是 5 6 5 2 7 9 9 6 5 号 , 请问你要 null 喝酒 吗？ with Prb: 5.519341722732793e-256\n",
      "sentence: 你好 我是 8 号 , 请问你要 null 喝酒 吗？ with Prb: 4.7565532670112375e-140\n"
     ]
    }
   ],
   "source": [
    "for sen in [generate_by_str(host,  target=\"host\",split='=') for i in range(10)]:\n",
    "    print('sentence: {} with Prb: {}'.format(sen, get_probablity(sen)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 161,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "今天晚上请你吃大餐，我们一起吃日料 is more possible\n",
      "---- 今天晚上请你吃大餐，我们一起吃日料 with probility 1.8889745916921315e-66\n",
      "---- 明天晚上请你吃大餐，我们一起吃苹果 with probility 1.5111796733537052e-66\n",
      "真是一只好看的小猫 is more possible\n",
      "---- 真事一只好看的小猫 with probility 4.5242555959271015e-34\n",
      "---- 真是一只好看的小猫 with probility 7.970947520145384e-27\n",
      "今晚我去吃火锅 is more possible\n",
      "---- 今晚我去吃火锅 with probility 3.401139863085618e-20\n",
      "---- 今晚火锅去吃我 with probility 5.396995716765105e-28\n",
      "养乐多绿来一杯 is more possible\n",
      "---- 洋葱奶昔来一杯 with probility 1.8285698188632354e-22\n",
      "---- 养乐多绿来一杯 with probility 3.2216203868326836e-15\n"
     ]
    }
   ],
   "source": [
    "need_compared = [\n",
    "    \"今天晚上请你吃大餐，我们一起吃日料 明天晚上请你吃大餐，我们一起吃苹果\",\n",
    "    \"真事一只好看的小猫 真是一只好看的小猫\",\n",
    "    \"今晚我去吃火锅 今晚火锅去吃我\",\n",
    "    \"洋葱奶昔来一杯 养乐多绿来一杯\"\n",
    "]\n",
    "\n",
    "for s in need_compared:\n",
    "    s1, s2 = s.split()\n",
    "    p1, p2 = get_probablity(s1), get_probablity(s2)\n",
    "    \n",
    "    better = s1 if p1 > p2 else s2\n",
    "    \n",
    "    print('{} is more possible'.format(better))\n",
    "    print('-'*4 + ' {} with probility {}'.format(s1, p1))\n",
    "    print('-'*4 + ' {} with probility {}'.format(s2, p2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
